Consociation and Dissociation
An Empirical Study of Word-Family Integration in English and German
0402
2008
978-3-8233-7384-1
978-3-8233-6384-2
Gunter Narr Verlag
Christina Sanchez
10.24053/9783823373841
CC BY-SA 4.0https://creativecommons.org/licenses/by-sa/4.0/deed.de
Since the middle of the twentieth century it has been widely believed that English words are less integrated into word families than German words. Ernst Leisi attributed this so-called dissociation to the large proportion of Romance words that have entered the originally Germanic English language in the course of its history. Even though fairly common, these hypotheses have not yet been tested empirically. This book thus presents a long-due study which subjects the 2,500 most frequent English and German lemmas to various analyses. For instance, they are analysed into constituents to which they are both formally and semantically related. In addition, morphosemantically related complex words containing the English and German list items are sought for. The approach adopted here, which considers a variety of variables such as formal differences and semantic obstacles, allows for a highly differentiated answer to the question whether the English vocabulary is dissociated or not. The last part of the book discusses the relevance of the study's surprising results with respect to the mental lexicon as well as language learning and teaching.
Gunter Narr Verlag Tübingen Christina Sanchez Consociation and Dissociation An Empirical Study of Word-Family Integration in English and German Language in Performance LiP Consociation and Dissociation 37 Edited by Werner Hüllen and Rainer Schulze Advisory Board: Thomas Herbst (Erlangen), Andreas Jucker (Zürich), Manfred Krug (Bamberg), Christian Mair (Freiburg i. Br.), Ute Römer (Hannover), Andrea Sand (Trier), Hans-Jörg Schmid (München), Josef Schmied (Chemnitz) and Edgar W. Schneider (Regensburg) Christina Sanchez Consociation and Dissociation An Empirical Study of Word-Family Integration in English and German Gunter Narr Verlag Tübingen Bibliografische Information der Deutschen Nationalbibliothek Die Deutsche Nationalbibliothek verzeichnet diese Publikation in der Deutschen Nationalbibliografie; detaillierte bibliografische Daten sind im Internet über <http: / / dnb.d-nb.de> abrufbar. Gedruckt mit Unterstützung der Geschwister Boehringer Ingelheim Stiftung für Geisteswissenschaften in Ingelheim am Rhein. D 29 © 2008 · Narr Francke Attempto Verlag GmbH + Co. KG Dischingerweg 5 · D-72070 Tübingen Das Werk einschließlich aller seiner Teile ist urheberrechtlich geschützt. Jede Verwertung außerhalb der engen Grenzen des Urheberrechtsgesetzes ist ohne Zustimmung des Verlages unzulässig und strafbar. Das gilt insbesondere für Vervielfältigungen, Übersetzungen, Mikroverfilmungen und die Einspeicherung und Verarbeitung in elektronischen Systemen. Gedruckt auf säurefreiem und alterungsbeständigem Werkdruckpapier. Internet: www.narr.de E-Mail: info@narr.de Druck und Bindung: Ilmprint, Langewiesen Printed in Germany ISSN 0939-9399 ISBN 978-3-8233-6384-2 I dedicate this book to my grandmother Friedel Hartmann and to my parents, Inge and Eduardo Sanchez, who have always supported me in every possible way. So eine Arbeit wird eigentlich nie fertig, man muß sie für fertig erklären, wenn man nach Zeit und Umständen das Möglichste getan hat. A work such as this is actually never finished. One has to declare it finished after having done what is possible with respect to time and circumstances. Johann Wolfgang von Goethe Italienische Reise 6 Acknowledgements First and foremost, I would like to thank my supervisor, Prof. Thomas Herbst, at the Department of English of the Friedrich-Alexander- Universität Erlangen-Nürnberg, for his constant guidance as I worked on this doctoral thesis and for giving me the opportunity to work in complete freedom, while always offering helpful support whenever needed. I would also like to express my thanks to my examiners Prof. Mechthild Habermann and Prof. Franz Josef Hausmann, Friedrich-Alexander-Universität Erlangen-Nürnberg, for their valuable guidance and expertise based on their respective fields of German and Romance linguistics and for showing constant interest in the progress my thesis was making. I am also very grateful to other people and institutions who have contributed to the present research project. Without their support, it could not have been carried out in this way. First of all, I would like to express my gratitude to the Studienstiftung des deutschen Volkes, whose scholarship enabled me to concentrate on my research. I would like to thank Dr. Alexander Geyken from the Berlin-Brandenburgische Akademie der Wissenschaften for providing me with the word frequency lists from the DWDS Core Corpus that were used in the comparison of English and German. My gratitude also goes to Franck Bodmer Mory from the Institut für deutsche Sprache in Mannheim, who compiled the word lists from the Cosmas II Corpora for me, and to Cyril Belica from the same institution for his advice on frequency band selection. I am also indebted to Prof. Adam Kilgarriff for letting me use his lemmatised BNC frequency lists and to Prof. Joan Bybee for giving me permission to reproduce a diagram from one of her publications. Furthermore, I would like to thank Prof. Antony Unwin from the Department of Computer-Oriented Statistics and Data Analysis at the University of Augsburg for calculating a considerable number of results (including significances), for providing illustrative graphs and for his help with statistical matters in general. Further, my gratitude goes to Rosemary Zahn for her suggestions as a native speaker of English on how to improve my original text linguistically. I am also immensely grateful to Hans Rainer Fickenscher for his meticulous proofreading and his valuable advice. Furthermore, I would like to thank Peter Uhrig, Department of English at the Friedrich-Alexander Universität Erlangen-Nürnberg, for his advice on software matters and web-based lexical searches, and I am very grateful to Dr. Eva-Maria and Dr. Johannes Pauli for their support with the programme Perl, which was used in different sorting and retrieval processes. I would also like to express my gratitude to everyone else who is not mentioned above but who has contributed 8 to the present book by providing inspiration or help in some way or another, among them Prof. Dieter Götz and Peter James from the University of Augsburg, James McCracken from Oxford University Press and Eveline Ohneis from Langenscheidt. As for the printed version of my doctoral thesis, I would like to express my gratitude to the Geschwister Boehringer Ingelheim Stiftung für Geisteswissenschaften for their considerable financial support. I would like to thank Prof. Rainer Schulze for the opportunity to publish this book in the series Language in Performance and Karin Burger and Jürgen Freudl from Gunter Narr for their continued support in turning a script into a real book. Most importantly, though, I would like to thank my whole family, particularly my mother Inge Sanchez and my father Eduardo Sanchez, for their unwavering support - be it moral, financial or of any other kind. Without them, this doctoral thesis would not have been possible. I am also very grateful towards my partner Philipp Stockhammer for our inspiring conversations and for his patience and understanding, particularly in the final stages of this doctoral thesis. Needless to say, none of the above is in any way to blame for any remaining flaws and inadequacies - I take full responsibility for them. Christina Sanchez Erlangen, February 2008 Index 1 Introduction to the concepts of consociation and dissociation .............................................................................. 17 1.1 Definitions ........................................................................................ 18 1.2 Influences on Leisi’s definitions .................................................... 19 1.3 The currency of consociation and dissociation as linguistic terms and concepts ......................................................................... 24 1.4 Conclusion ....................................................................................... 35 2 Terminology .................................................................................. 37 2.1 Consociation and dissociation....................................................... 37 2.2 Motivation and motivatability ...................................................... 38 2.2.1 Phonetic-semantic motivation.............................................................40 2.2.2 Orthographic-semantic motivation ....................................................41 2.2.3 Morphosemantic motivation...............................................................41 2.2.3.1 Partial motivatability and transparency ............................................47 2.2.4 Semantic motivation.............................................................................57 2.2.5 Etymological motivation......................................................................58 2.2.6 Motivation by foreign elements ..........................................................58 2.2.7 Interlingual motivation ........................................................................60 2.2.8 Motivatability in the present study ....................................................61 2.3 Expandability................................................................................... 61 2.3.1 Qualitative measurement of expandability.......................................62 2.3.2 Partial expandability ............................................................................63 2.3.3 Expandability and word formation....................................................64 2.3.4 Expandability in the present study ....................................................68 2.4 Summary .......................................................................................... 69 3 Design of the research project ............................................. 71 3.1 Material............................................................................................. 72 3.1.1 Preliminary considerations..................................................................73 3.1.2 Corpora ..................................................................................................75 3.1.2.1 British National Corpus .......................................................................75 3.1.2.2 DWDS Core Corpus .............................................................................76 3.1.3 Criteria for the extraction of frequency lists from the corpora .......78 3.2 Status codes...................................................................................... 80 3.2.1 Excluded items ......................................................................................81 3.2.2 Shortenings ............................................................................................81 3.2.3 Proper nouns .........................................................................................82 3.2.4 Items derived from numbers, proper nouns and acronyms ...........84 Introduction to the concepts of consociation and dissociation 10 3.3 Analysis of motivatability .............................................................. 84 3.3.1 Sources ...................................................................................................85 3.3.2 Motivatability codes .............................................................................86 3.3.2.1 Unmotivatable items ............................................................................86 3.3.2.2 Partially motivatable items..................................................................87 3.3.2.3 Completely motivatable items ............................................................95 3.3.3 Principles ...............................................................................................96 3.4 Analysis of expandability ............................................................ 100 3.4.1 Sources .................................................................................................100 3.4.1.1 Dictionaries .........................................................................................101 3.4.1.2 Corpora ................................................................................................105 3.4.1.3 Source codes ........................................................................................109 3.4.2 Expandability codes ...........................................................................109 3.4.3 Principles .............................................................................................112 3.5 Etymological analysis ................................................................... 115 3.5.1 Sources .................................................................................................115 3.5.2 Word origin .........................................................................................117 3.5.3 Period of first attestation....................................................................120 3.5.4 Principles .............................................................................................122 3.5.5 Etymological codes .............................................................................127 4 Results.................................................................................... 129 4.1 British National Corpus ............................................................... 130 4.1.1 Frequency.............................................................................................130 4.1.2 Word length.........................................................................................130 4.1.3 Part of speech ......................................................................................132 4.1.4 Morphology .........................................................................................135 4.1.4.1 Compounds ........................................................................................135 4.1.4.2 Affixes ..................................................................................................136 4.1.5 Etymology............................................................................................140 4.1.5.1 Etymological origin ............................................................................141 4.1.5.2 Period of origin ...................................................................................146 4.1.5.3 Etymological origin and period of origin ........................................148 4.1.6 Motivatability ......................................................................................149 4.1.6.1 Motivatability and frequency............................................................154 4.1.6.2 Motivatability and part of speech.....................................................156 4.1.6.3 Motivatability and word length........................................................158 4.1.6.4 Motivatability and etymological origin ...........................................159 4.1.6.5 Motivatability and period of origin..................................................163 4.1.7 Expandability ......................................................................................164 4.1.7.1 Expandability and frequency ............................................................168 4.1.7.2 Expandability and source size ..........................................................168 4.1.7.3 Expandability and word length ........................................................170 Definitions 11 4.1.7.4 Expandability and etymological origin ...........................................171 4.1.7.5 Expandability and period of origin ..................................................172 4.1.8 Consociation ........................................................................................173 4.1.8.1 Consociation and frequency..............................................................174 4.1.8.2 Consociation and part of speech.......................................................175 4.1.8.3 Consociation and word length..........................................................175 4.1.8.4 Consociation and etymological origin .............................................176 4.1.8.5 Consociation and period of origin....................................................177 4.1.9 Dissociation .........................................................................................178 4.1.9.1 Dissociation and frequency ...............................................................179 4.1.9.2 Dissociation and part of speech ........................................................180 4.1.9.3 Dissociation and word length ...........................................................181 4.1.9.4 Dissociation and etymological origin...............................................181 4.1.9.5 Dissociation and period of origin .....................................................182 4.2 DWDS Core Corpus...................................................................... 182 4.2.1 Frequency.............................................................................................182 4.2.2 Word length.........................................................................................182 4.2.3 Morphology .........................................................................................184 4.2.3.1 Compounds .........................................................................................184 4.2.3.2 Affixes ..................................................................................................184 4.2.4 Etymological origin ............................................................................192 4.2.5 Motivatability ......................................................................................196 4.2.5.1 Motivatability and frequency............................................................201 4.2.5.2 Motivatability and word length........................................................203 4.2.5.3 Motivatability and etymological origin ...........................................203 4.2.6 Expandability ......................................................................................205 4.2.6.1 Expandability and frequency ............................................................208 4.2.6.2 Expandability and source size ..........................................................208 4.2.6.3 Expandability and word length ........................................................210 4.2.6.4 Expandability and etymological origin ...........................................211 4.2.7 Consociation ........................................................................................211 4.2.7.1 Consociation and frequency..............................................................212 4.2.7.2 Consociation and word length..........................................................213 4.2.7.3 Consociation and etymological origin .............................................214 4.2.8 Dissociation .........................................................................................215 4.2.8.1 Dissociation and frequency ...............................................................217 4.2.8.2 Dissociation and part of speech ........................................................218 4.2.8.3 Dissociation and word length ...........................................................218 4.2.8.4 Dissociation and etymological origin...............................................218 4.3 English vs. German ....................................................................... 218 4.3.1 Word length.........................................................................................219 4.3.2 Frequency.............................................................................................219 4.3.3 Morphology .........................................................................................220 Introduction to the concepts of consociation and dissociation 12 4.3.3.1 Compounds .........................................................................................220 4.3.3.2 Affixes ..................................................................................................220 4.3.4 Etymology............................................................................................221 4.3.5 Motivatability ......................................................................................222 4.3.5.1 Motivatability and frequency............................................................226 4.3.5.2 Motivatability and etymological origin ...........................................226 4.3.6 Expandability ......................................................................................227 4.3.6.1 Expandability and source size ..........................................................229 4.3.6.2 Expandability and etymological origin ...........................................229 4.3.7 Consociation ........................................................................................230 4.3.8 Dissociation .........................................................................................232 5 Discussion ............................................................................. 235 5.1 Testing the hypotheses ................................................................. 235 5.1.1 Hypothesis 1: Motivatability is higher in German than in English..................................................................................................235 5.1.2 Hypothesis 2: Expandability is higher in German than in English..................................................................................................236 5.1.3 Hypothesis 3: German is a considerably consociated language...237 5.1.4 Hypothesis 4: English is a considerably dissociated language. ....237 5.1.5 Hypothesis 5: There are more Germanic than Romance words among the high-frequency lemmas. .................................................238 5.1.6 Hypothesis 6: Motivatability is higher in Germanic than in Romance words...................................................................................238 5.1.7 Hypothesis 7: Consociation is higher in Germanic than in Romance words...................................................................................239 5.1.8 Hypothesis 8: Old words are less motivatable but more expandable than recent words. .........................................................240 5.2 Alternative results ......................................................................... 241 5.2.1 A more restrictive approach..............................................................242 5.2.2 A less restrictive approach ................................................................244 5.2.3 Conclusion ...........................................................................................245 5.3 Comparison with previous studies............................................. 246 5.3.1 Fill (1980)..............................................................................................246 5.3.2 Scheidegger (1981) ..............................................................................248 5.3.3 Summary..............................................................................................250 5.4 Limitations and countermeasures............................................... 251 6 Consociation and dissociation in perspective.................. 259 6.1 Psycholinguistic perspective: the mental lexicon ..................... 259 6.1.1 Full listing vs. minimal listing...........................................................260 6.1.2 The effect of consociation/ dissociation on the mental lexicon.....263 6.1.2.1 The effect of motivatability................................................................263 Definitions 13 6.1.2.2 The effect of expandability ................................................................269 6.1.2.3 Conclusion ...........................................................................................270 6.2 Didactic perspective: vocabulary learning and teaching......... 271 6.2.1 The effect of full motivatability.........................................................271 6.2.2 The effect of partial motivatability ...................................................272 6.2.3 The effect of transparency .................................................................274 6.2.4 The effect of expandability ................................................................276 6.2.5 The effect of a Romance origin..........................................................276 6.2.6 Conclusion and application in the classroom .................................277 6.3 Concluding remarks ..................................................................... 280 7 Bibliography ......................................................................... 281 7.1 Printed sources and CD-ROMs ................................................... 281 7.2 Internet sources ............................................................................. 293 Tables Table 1: Models of motivation ............................................................................................ 39 Table 2: English, German, French and Spanish internationalisms ................................. 60 Table 3: Relative distribution of words across domains in the BNC.............................. 75 Table 4: Relative distribution of words across media in the BNC.................................. 75 Table 5: Distribution of text types in the DWDS Core Corpus ....................................... 77 Table 6: Distribution of the DWDS word tokens across decades ................................... 78 Table 7: Status codes ............................................................................................................ 80 Table 8: Motivational analyses: notation conventions ..................................................... 85 Table 9: Codes for unmotivatable items ............................................................................ 87 Table 10: Partial motivatability codes .................................................................................. 94 Table 11: Full motivatability codes ...................................................................................... 95 Table 12: Symbols marking the source of expansions...................................................... 109 Table 13: Word family integration codes........................................................................... 110 Table 14: Supplementary codes for partial expandability............................................... 112 Table 15: The periods of the English and German languages......................................... 121 Table 16: Word origin codes ............................................................................................... 127 Table 17: Period codes ......................................................................................................... 127 Table 18: Length of the BNC analysis items in number of characters............................ 131 Table 19: Distribution across part of speech of the BNC analysis items........................ 132 Table 20: Distribution across part of speech of the 100 most frequent BNC analysis items ....................................................................................................... 133 Table 21: English prefixes and initial combining forms .................................................. 136 Table 22: English lexical suffixes and final combining forms ......................................... 137 Table 23: English grammatical suffixes ............................................................................. 139 Table 24: English affix types and tokens ........................................................................... 139 Table 25: English affixes missing from the learner’s dictionaries................................... 140 Table 26: Etymological origin of the BNC items............................................................... 141 Table 27: Romance subcategories of the BNC items ........................................................ 143 Introduction to the concepts of consociation and dissociation 14 Table 28: Self-elaborated etymologies of the BNC items................................................. 144 Table 29: Etymological distribution and frequency of the English affix types ............. 144 Table 30: Etymological distribution of the English affix types ....................................... 144 Table 31: Etymological origin and token-type-ratio of the English affixes ................... 145 Table 32: Period of origin of the BNC items...................................................................... 147 Table 33: Period of origin and etymological origin of the BNC items ........................... 148 Table 34: The detailed English motivational codes .......................................................... 149 Table 35: Motivatability: simplified table.......................................................................... 151 Table 36: Partial motivatability: simplified table.............................................................. 152 Table 37: Part of speech and motivatability of the BNC items ....................................... 156 Table 38: Motivatable BNC items and part of speech ...................................................... 157 Table 39: Parts of speech involved in motivatability by a grammatical polyseme in the BNC items .................................................................................................. 158 Table 40: Word length and motivatability of the BNC items.......................................... 158 Table 41: Motivatability and etymological origin of the BNC items.............................. 159 Table 42: Simplified motivatability and etymological origin of the BNC items ........... 160 Table 43: The motivatability of BNC items with a Greek origin..................................... 161 Table 44: The motivatability of BNC items with a Greek origin vs. average motivatability ....................................................................................................... 161 Table 45: The motivatability of BNC items with a French origin ................................... 161 Table 46: The motivatability of BNC items with French origin vs. average motivatability ....................................................................................................... 162 Table 47: The motivatability of BNC items with Latin origin......................................... 162 Table 48: The motivatability of BNC items with Latin origin vs. average motivatability ....................................................................................................... 162 Table 49: The motivatability of BNC items with Greek, French and Latin origin vs. average motivatability................................................................................... 163 Table 50: Motivatability and period of origin of the BNC items .................................... 163 Table 51: Expandability of the BNC items ........................................................................ 164 Table 52: Expandability of the BNC items: simplified table ........................................... 165 Table 53: Partial expandability of the BNC items: simplified table................................ 165 Table 54: Diasystematic labels of the BNC items.............................................................. 167 Table 55: Expandability of the BNC items according to their sources ........................... 168 Table 56: Frequency of the BNC expansions..................................................................... 168 Table 57: Part of speech of the BNC-expanded items vs. average.................................. 169 Table 58: Expansions for English functional words ......................................................... 170 Table 59: Word length and expandability of the BNC items .......................................... 170 Table 60: Expandability and etymological origin of the BNC items .............................. 171 Table 61: Expandability and period of origin of the BNC items..................................... 172 Table 62: Degrees of consociation of the BNC items........................................................ 173 Table 63: Consociation and part of speech of the BNC items ......................................... 175 Table 64: Consociation and word length of the BNC items ............................................ 176 Table 65: Consociation and etymological origin of the BNC items................................ 176 Table 66: Consociation and period of origin of the BNC items ...................................... 177 Table 67: The dissociated English items ............................................................................ 178 Table 68: Motivatability of BNC ranks 1-100 and 2,401-2,500......................................... 179 Table 69: Word-family integration of BNC ranks 1-100 and 2,401-2,500....................... 179 Table 70: Length of the DWDS analysis items in number of characters ........................ 183 Table 71: German lexical prefixes and initial combining forms ..................................... 185 Table 72: German grammatical prefixes............................................................................ 187 Table 73: German lexical suffixes and final combining forms ........................................ 187 Table 74: German grammatical suffixes ............................................................................ 190 Definitions 15 Table 75: German interfixes ................................................................................................ 191 Table 76: German affix types and tokens .......................................................................... 191 Table 77: Etymological origin of the DWDS items ........................................................... 193 Table 78: Self-elaborated German etymologies ................................................................ 194 Table 79: Etymological distribution of the German affix types ...................................... 194 Table 80: Etymology and token-type-ratio of the German affixes.................................. 195 Table 81: The detailed motivational codes of the DWDS items...................................... 196 Table 82: Motivatability of the DWDS items: simplified table........................................ 199 Table 83: Partial motivatability of the DWDS items: simplified table ........................... 200 Table 84: Word length and motivatability of the DWDS items ...................................... 203 Table 85: Simplified motivatability and etymological origin of the DWDS items ....... 204 Table 86: Motivatability and etymological origin of the DWDS items .......................... 204 Table 87: Motivatability of the DWDS items with Greek origin..................................... 204 Table 88: Expandability of the DWDS items..................................................................... 206 Table 89: Expandability of the DWDS items: simplified table ........................................ 206 Table 90: Partial expandability of the DWDS items: simplified table ............................ 207 Table 91: Diasystematic labels of the DWDS items .......................................................... 207 Table 92: Expandability of the DWDS items according to their sources ....................... 208 Table 93: Frequency of the DWDS expansions ................................................................. 209 Table 94: Word length and expandability of the DWDS items....................................... 210 Table 95: Expandability and etymological origin of the DWDS items .......................... 211 Table 96: Degrees of consociation of the DWDS items .................................................... 212 Table 97: Consociation and word length of the DWDS items......................................... 214 Table 98: Consociation and etymological origin of the DWDS items ............................ 214 Table 99: The dissociated German items ........................................................................... 216 Table 100: IDS expansions of the dissociated German items ............................................ 217 Table 101: Contrastive length of the analysis items in number of characters ................. 219 Table 102: Contrastive etymological origin......................................................................... 221 Table 103: Contrastive etymological origin of the affixes ................................................. 221 Table 104: Contrastive token-type-ratio and etymological origin.................................... 222 Table 105: Contrastive motivatability .................................................................................. 223 Table 106: Contrastive motivatability: simplified table ..................................................... 223 Table 107: Contrastive motivatability without compounds.............................................. 223 Table 108: Contrastive partial motivatability...................................................................... 224 Table 109: Contrastive etymological origin of motivatable words................................... 226 Table 110: Contrastive motivatability and etymological origin........................................ 227 Table 111: Contrastive expandability................................................................................... 227 Table 112: Contrastive partial expandability ...................................................................... 228 Table 113: Contrastive expandability without compounds .............................................. 228 Table 114: Contrastive expandability and source size ....................................................... 229 Table 115: Contrastive expandability and etymological origin ........................................ 229 Table 116: Contrastive consociation..................................................................................... 230 Table 117: Contrastive consociative strength...................................................................... 230 Table 118: Contrastive consociation and etymological origin .......................................... 231 Table 119: Constrastive etymological origin and consociative strength: in tokens........ 232 Table 120: Constrastive etymological origin and consociative strength: in points ........ 232 Table 121: Transparency in Fill’s separate study................................................................ 247 Table 122: Motivation in previous studies .......................................................................... 250 Table 123: Comparison of previous studies and the present study ................................. 250 Introduction to the concepts of consociation and dissociation 16 Figures Figure 1: Word length and rank of the BNC items .......................................................... 132 Figure 2: Part of speech and rank of the BNC items: lexical words ............................... 134 Figure 3: Part of speech and rank of the BNC items: grammatical words .................... 135 Figure 4: Etymological origin and rank of the BNC items .............................................. 146 Figure 5: Period of origin and rank of the BNC items ..................................................... 147 Figure 6: Motivatability and rank of the BNC items........................................................ 155 Figure 7: Motivatability and rank of the BNC items: simplified version ...................... 155 Figure 8: Consociation and rank of the BNC items.......................................................... 174 Figure 9: Word length and rank of the DWDS items....................................................... 184 Figure 10: Etymological origin and rank of the DWDS items .......................................... 195 Figure 11: Motivatability and rank of the DWDS items .................................................... 201 Figure 12: Motivatability and rank of the DWDS items: simplified version................... 202 Figure 13: Consociation and rank of the DWDS items ...................................................... 213 Figure 14: Sets of lexical connections yielding word-internal morphological structure ................................................................................................................ 268 1 Introduction to the concepts of consociation and dissociation In 1983 1 Meara (1983: ii) remarks that Despite the fact that learners themselves readily identify vocabulary acquisition as a major source of difficulty, it is an area which has largely been ignored by Applied Linguistics. Four years later, in the second volume of Vocabulary in a Second Language, he returns to his initial statement and modifies it (Meara 1987: 1): Not very long ago, it was fashionable to refer to vocabulary acquisition as a “neglected aspect” of second language acquisition, and a whole series of variations on this theme, in many languages, has appeared in print. It is obvious, however, that this image is no longer a true one. Indeed, a whole range of linguistic works discuss the question whether the study of vocabulary in general is a neglected area by referring back to Meara himself, e.g. Schmitt and McCarthy (1997: I), who note that the field of vocabulary studies is now anything but a neglected area, and the mushrooming amount of experimental studies and pedagogical and reference material being published is enough to swamp even lexical specialists trying to keep abreast of current trends. However, certain aspects of lexicology remain untouched by this general trend and are transmitted from one generation of linguists to another without really being questioned. The claim that English is a considerably dissociated language in contrast to the considerably consociated German language is one such aspect. Since first being stated in Leisi’s Das heutige Englisch (1955), 2 this hypothesis has become an established part of the curriculum of English linguistics as taught at German universities. The very similar claim that English is a lexicological language and less motivated than the grammatical German language even goes as far back as de Saussure’s 1916 Cours de linguistique générale, which can be regarded as the standard work from which modern linguistics originated. It therefore comes as a surprise to discover that Leisi’s hypothesis, which is based on a selection of examples only, should not have been subjected to an empirical test so far. The present work attempts to redress this overdue piece of research by analysing a 1 Even before that, in 1982, Meara published an article with the eloquent title “Vocabulary Acquisition: A Neglected Aspect of Language Learning”. 2 This book has become a classic at German universities and has undergone eight editions. Even if the eighth edition of the book was slightly revised by Mair and Leisi, it will be referred to as Leisi (1999) because the passages which are relevant for the present work remain virtually unchanged. Introduction to the concepts of consociation and dissociation 18 large number of English and German words. Psycholinguistic considerations and possible didactic applications in foreign language teaching will round off the picture. 1.1 Definitions In Chapter 10 of Das heutige Englisch, Leisi comments on the mixed character of the English lexicon and explains the concept of dissociation (Leisi 1999: 51): 3 Die Problematik der Latinismen im Englischen liegt nicht nur darin, daß sie die Zahl der Wörter gewaltig vermehrt und zu subtilen Bedeutungsunterscheidungen geführt haben, sondern ganz besonders auch darin, daß sie außerdem eine im Englischen ohnedies bedeutsame Entwicklung gefördert haben, der wir den Namen Dissoziation geben. Der Begriff sei kurz erläutert. Die deutschen Wörter mündlich und Dreifuß stehen formal nicht allein, sondern sie sind leicht mit andern Wörtern in Beziehung zu bringen, mit denen sie formal und bedeutungsmäßig verwandt sind: mündlich mit Mund, Dreifuß mit drei und Fuß; diese Wörter sind also unter sich gewissermaßen vergesellschaftet (konsoziiert). Anders verhält es sich mit den englischen Entsprechungen oral und tripod. Diese beiden haben keine Verwandtschaftsbeziehung, die zugleich Laut und Bedeutung einschließt: Die bloß lautlich verwandten (or = oder, tripe = Kaldaunen u.a.) haben sinnmäßig nichts mit ihnen zu tun, die sinnmäßig verwandten (mouth oder stool) klingen vollkommen verschieden. Die Wörter oral und tripod gehören also nicht einer etymologischen (laut- und sinnverwandten) Familie an, sondern sie stehen allein, gleichsam asozial da. Eine Entwicklung, die in die Richtung geht, die Wörter asozial zu machen, sowie den durch sie erreichten Zustand nennen wir im folgenden Dissoziation. After considering several examples, 4 Leisi (1999: 55) comes to the conclusion that English and French are considerably dissociated languages, while 3 Leisi (1999: 51): “The problem of the Latinisms in English is not only that they have greatly increased the number of words and led to subtle differences in meaning, but particularly that they have also reinforced a development which has been significant in the English language anyway, and which we call dissociation. Let me briefly illustrate the term: the German words mündlich ‘oral’ and Dreifuß ‘tripod’ do not stand alone formally, but they can easily be brought into a relation with other words they are formally and semantically related to: mündlich with Mund ‘mouth’, Dreifuß with drei ‘three’ and Fuß ‘foot’; these words constitute a lexical society among themselves, as it were (they are consociated). Not so the English equivalents oral and tripod. These two have no relation to other words which simultaneously includes their sound and their meaning: the merely phonetic relatives (the conjunction or, tripe etc.) have nothing to do with them semantically, while the semantic relatives (mouth or stool) sound completely different. Therefore, the words oral and tripod do not belong to an etymological (phonetically and semantically related) family, but they are alone and so to speak antisocial. In the following, we shall call a development which makes words become antisocial, as well as the state reached by this development, dissociation” (my translation). Influences on Leisi’s definitions 19 German and Italian are considerably consociated. 5 In his definition of dissociation, he had already introduced the complementary adjective consociated. With the use of the terms dissociated and consociation in the following pages, the dichotomy becomes complete in both parts of speech. The formulations “der wir den Namen Dissoziation geben” and “nennen wir im folgenden Dissoziation” suggest that the terminology has not been taken over from other authors, but that it has been created by Leisi himself. This is also assumed by Erades (1956-57: 210) in his review of Das heutige Englisch. Leisi gives further reason for the dissociation of the English vocabulary that is related to the influx of Latinisms: while large and easily recognisable word families still existed in Old English times, just as in Modern German, he believes that language change has led to English losing a large part of its former word-formation elements, such as prefixes and suffixes. 6 Therefore, Leisi expects that English word families - when there are any - can now be found in the Franco-Latin rather than in the Germanic word material (Leisi 1999: 51-53). By contrast, German mainly introduced loan translations where English has a Latin term, e.g. in the case of prejudice vs. Vorurteil, or contradiction vs. Widerspruch (cf. Vossen 1992: 138-139). 1.2 Influences on Leisi’s definitions Like most theories, Leisi’s statement about the dissociated English language has its predecessors - even in the author’s own writing: in his 1953 article “The Problem of the ‘Hard Words’”, Leisi already shows an interest in the mixed vocabulary of English (Leisi 1953: 262): 4 Other English words which Leisi (1999: 52) claims to be dissociated are appendix (as against consociated German Blinddarm), hippopotamus (cf. Nilpferd), perambulator (cf. Kinderwagen), galaxy (cf. Milchstraße), diminish (cf. abnehmen), syringe (cf. Spritze), cemetery (cf. Friedhof), nausea (cf. Brechreiz), synopsis (cf. Übersicht) and plenipotentiary (cf. Bevollmächtigter). 5 Leisi (1999: 55): “Gewiß ist nur, daß das Englische und das Französische zu den wesentlich dissoziierten, das Deutsche und Italienische zu den konsoziierten unter den modernen Sprachen gezählt werden können.” The translation of this passage presents the difficulty that the word wesentlich can be interpreted in more than one way: either in its established senses of ‘fundamental, considerable’, which results in the reading that English and French are fundamentally or considerably dissociated, or in the re-motivating sense of ‘by its nature’, which Leisi (1999: 58) himself comments on a few pages later, and which would lead to the interpretation that English and French are by nature dissociated. 6 Leisi claims that the old word-formation elements have been forgotten and have died (cf. Leisi 1961: 261: “vergessen und abgestorben”). Introduction to the concepts of consociation and dissociation 20 It has often been stated that the so-called ‘hard words’ (many of the words of Latin and Greek origin) are a peculiar problem for the English language. They are - in the terms of the Geneva school - ‘mots non relativement motivés’, which means that a word like hippopotamus is not related to other words (as German Nil-pferd, Dutch nijl-paard) and thus lacks etymological support. However, the term dissociation has still not been coined by Leisi at that time. By contrast, most of the works indicated in the bibliography of Das heutige Englisch, 7 which may be taken to have inspired Leisi, were already available then. Leisi’s most important predecessor in stating that the words in the vocabulary of different languages are connected to different degrees is de Saussure. In his Cours de linguistique générale, he writes (de Saussure 1916/ 1960: 183-184): 8 Il n’existe pas de langue où rien ne soit motivé; quant à en concevoir une où tout le serait, cela serait impossible par définition. Entre les deux limites extrêmes - minimum d’organisation et minimum d’arbitraire - on trouve toutes les variétés possibles. Les divers idiomes renferment toujours des éléments des deux ordres - radicalement arbitraires et relativement motivés - mais dans des proportions très variables, et c’est là un caractère important, qui peut entrer en ligne de compte dans leur classement. En un certain sens - qu’il ne faut pas serrer de trop près, mais qui rend sensible une des formes de cette opposition - on pourrait dire que les langues où l’immotivité atteint son maximum sont plus lexicologiques, et celles où il [sic] s’abaisse au minimum, plus grammaticales. […] On verrait par exemple que l’anglais donne une place beaucoup plus considérable à l’immotivé que l’allemand; […] le français est caractérisé par rapport au latin, entre autres choses, par un énorme accroissement de l’arbitraire. If motivation is taken to be roughly equivalent to morphosemantic compositionality - a widely shared view, which is supported, amongst others, by 7 Many publications from the original bibliography at the end of the chapter on dissociation and elsewhere were omitted in the 8 th edition because the new edition only aims to include works which are accessible to students at new universities (Leisi 1999: VI). 8 Translation into English by Baskin (de Saussure 1916/ 1974: 133-134): “There is no language in which nothing is motivated, and our definition makes it impossible to conceive of a language in which everything is motivated. Between the two extremes- a minimum of organization and a minimum of arbitrariness-we find all possible varieties. Diverse languages always include elements of both types-radically arbitrary and relatively motivated-but in proportions that vary greatly, and this is an important characteristic that may help in classifying them. In a certain sense-one which must not be pushed too far but which brings out a particular form that the opposition may take-we might say that languages in which there is least motivation are more lexicological, and those in which it is greatest are more grammatical. […] We would see, for example, that motivation plays a much larger role in German than in English. […] with respect to Latin, French is characterized, among other things, by a huge increase in arbitrariness.” Influences on Leisi’s definitions 21 the definitions in Bußmann (2002: 452) 9 and Herbst, Stoll and Westermayr (1991: 23-24) 10 -, it is possible to conclude that de Saussure’s claim is hyponymous to Leisi’s because it only considers the analytical direction. However, one must not overlook the fact that de Saussure’s own definition of relative motivation (de Saussure 1916/ 1974: 131) differs slightly from the contemporarily most accepted version: Some signs are absolutely arbitrary; in others we note, not its complete absence, but the presence of degrees of arbitrariness: the sign may be relatively motivated. For instance, both vingt ‘twenty’ and dix-neuf ‘nineteen’ are unmotivated in French, but not in the same degree, for dix-neuf suggests its own terms and other terms associated with it (e.g. dix ‘ten,’ neuf ‘nine,’ vingt-neuf ‘twenty-nine,’ dixhuit ‘eighteen,’ soixante-dix ‘seventy,’ etc.). De Saussure (1916/ 1974: 132) concludes: The notion of relative motivation implies: (1) analysis of a given term, hence a syntagmatic relation; and (2) the summoning of one or more terms, hence an associative relation. Thus, de Saussure sees words as relatively motivated not only by their constituents, but also by other words that are formed on the basis of these constituents. However, it would go too far to assume that motivation in the Saussurean sense may purely rely on these constituent-based associated words. Otherwise, a reference to the other example word, vingt, being motivated by vingt-neuf, for example, might have occurred; but instead, all of the examples given involve morphological analysability as an intermediate step. In addition, motivation as it is most widely used today seems to exclude the associative aspect mentioned in the Cours de linguistique générale. Summing up, it will therefore be assumed here that de Saussure is the direct predecessor of the analytical and language-comparative aspects of dissociation only. The second most important influence on Leisi with respect to consociation and dissociation is von Wartburg. In his Einführung in Problematik und Methodik der Sprachwissenschaft, he deals with the historical fragmentation of French word families and the related loss of feeling for the internal relations between the words (von Wartburg 1943: 173-175). Thus, Latin mansionem ‘house’ became French maison, while the originally related mansionaticum ‘household’ became French ménage. Also, semantically related con- 9 Bußmann (2002: 452): “Eine Wortbildung gilt als motiviert, wenn sich ihre Gesamtbedeutung aus der Summe der Bedeutungen ihrer einzelnen Elemente ableiten lässt, z.B. Zeitungsleser, Theateraufführung, Tischlampe.” 10 Herbst, Stoll and Westermayr (1991: 23-24): “MOTIVATION/ MOTIVIERTHEIT (MOTIVATION) ist also die semantische Durchsichtigkeit eines komplexen sprachlichen Zeichens im Hinblick auf die Zusammensetzung seiner Elemente, wobei sich seine Bedeutung in gewissen Grenzen aus den Bedeutungen der Elemente ableiten läßt.” Introduction to the concepts of consociation and dissociation 22 cepts, such as ‘blind’ and ‘blindness’, may have very different formal realisations in modern French: aveugle and cécité. Von Wartburg calls this Auflösung ‘dissolution’ and says that the groups of words in question are aufgelöst ‘dissolved’. The similarity with Leisi’s line of argument is rather obvious, and so this passage can be taken to be a direct stimulus for the development of Leisi’s English-centred theory. What is really striking, however, is the terminology in the French translation of von Wartburg’s book by Maillard (von Wartburg 1943/ 1946: 172-173): 11 En italien et en espagnol, les groupes de mots sont beaucoup moins dissociés qu’en français. [...] Cette dissociation apparaît avec une particulière netteté dans les appellations de lieu qui se rapportent les uns aux autres […]. This passage shows that the term dissociation was used in linguistics with reference to semantically related words that are formally different even before Leisi. However, while it can be safely assumed that Leisi knew about von Wartburg’s ideas when writing Das heutige Englisch - after all, the Einführung in Problematik und Methodik der Sprachwissenschaft appears in his bibliography -, it is less certain that he was familiar with the French terminology because only the German version is mentioned as a source in Leisi (1955: 67). 12 Moreover, the fact that the French translation makes use of the word dissociation before the publication of Das heutige Englisch must not be overestimated: as French is a Romance language, dissociation is a general-level word with ‘separation’ as one of its meanings 13 and may not even have been consciously used as a linguistic term by the translator. For instance, it is not defined in the relevant passage. By contrast, the coining of the German word Dissoziation from loan elements presupposes a larger degree of linguistic consciousness by Leisi, who is at pains to emphasise the idea of a deteriorated lexical society. Furthermore, in German, the term Dissoziation is used in the fields of psychology, medicine and chemistry. 14 Knowledge of the latter could have inspired Leisi to an analogy as well. Thus, the identity of the term with von Wartburg’s French terminology may either be attributed to pure coincidence or to a borrowing, be it conscious or subconscious. 11 von Wartburg (1943/ 1946: 172-173): “In Italian and Spanish, the groups of words are far less dissociated than in French. […] This dissociation is particularly obvious in the case of place names that relate to each other” (my translation). 12 It is improbable that Leisi should have wanted to spare potential student readers a French text - after all, Gilliéron (1922) is also mentioned in the bibliography (Leisi 1955: 67). However, one must not forget that the von Wartburg text was originally written in German and subsequently translated into French, which makes this a special case. 13 See Le Nouveau Petit Robert (2007) s.v. dissociation. 14 See Duden Deutsches Universalwörterbuch (2003), abbreviated to UW in the following, s.v. Dissoziation. Influences on Leisi’s definitions 23 Apart from de Saussure and von Wartburg, Leisi’s (1955: 67-68) bibliography mentions Gneuss (1955), Grove (1949), Bally (1944) and Weisgerber (1953) as sources relating to consociation in Old English, dissociation and its consequences, and relative motivation. 15 The contributions of these works on the concepts discussed here vary. It can be assumed, for instance, that Bally’s Linguistique générale et linguistique française (1944), which also states that relative motivation is more widespread in German than in French, 16 took over the idea from the author’s teacher de Saussure, whose Cours de linguistique générale Bally published posthumously in cooperation with Sechehaye (de Saussure 1916/ 1960). This assumption is confirmed by the fact that Bally refers to de Saussure’s concept of a lexicological language a few pages after the passage on motivation. 17 While the contribution of Weisgerber’s Vom Weltbild der deutschen Sprache is of a less direct kind, 18 the influence of Grove (1949) on the relevant passages in Leisi (1955) is fairly obvious: several of Leisi’s examples for dissociation, such as oral, tripod and hippopotamus, ultimately come from The Language Bar. 19 Ideas from Gneuss (1955) are also directly reflected in Leisi’s Das heutige Englisch. Gneuss gives an account of the different types of loan formations in Old English and quotes Jespersen (1905), according to whom the extent to which native resources were used to express borrowed concepts in Old English is astonishing, particularly in comparison with subsequent periods (Gneuss 1955: 3). 15 Furthermore, Koziol (1937) and Gilliéron (1922) are mentioned as sources for popular etymology, and Empson (1930 and 1951) as a source for etymological revival in English. 16 Cf. Bally (1965: 341): “Mais comparons d’abord cet état de choses avec celui que nous présente l’allemand: par opposition au français, il motive abondamment et explicitement son vocabulaire et sa grammaire”, “But let us first compare this situation with that in the German language: in contrast to French, its vocabulary and its grammar are abundantly and explicitly motivated” (my translation). 17 Cf. Bally (1965: 344): “le français se rapproche du type de langue que Saussure […] appelle lexicologique”, “French comes close to the type of language that Saussure calls lexicological” (my translation). 18 Weisgerber’s passage about the Wortstände is announced to deal with relative motivation in German. By this term, Weisgerber (1953: 162) understands a group of words which have been formed by derivation and share part of their meaning (“eine G r u p p e v o n W ö r t e r n , die a u f d e m W e g e d e r (ein- oder mehrförmigen) A b l e i t u n g g e w o n n e n , die W i r k u n g e i n e r g e s c h l o s s e n e n i n h a l t l i c h e n L e i s t u n g vollbringt”), e.g. German derivatives in -ling that stress the human being’s incompleteness in some way or another: Säugling ‘baby’, Neuling ‘beginner’ etc. (Weisgerber 1953: 164-165). 19 Furthermore, Grove (1949: 52) speaks of the English language’s “poly-lingual vocabulary of great variety and complexity, which imposes a considerable burden upon the memory” (Grove 1949: 52). Introduction to the concepts of consociation and dissociation 24 Summing up, even though Leisi’s concepts of consociation and dissociation are based on the theories of his predecessors, an original and creative aspect is involved: Leisi is the first linguist who uses the two terms in one publication and defines consociation and dissociation in detail. Furthermore, his name is closely connected with the phenomenon because he defines the terms in his best-seller Das heutige Englisch. 1.3 The currency of consociation and dissociation as linguistic terms and concepts In addition to the occurrences of the terms consociation and dissociation in Leisi’s own works, there are many publications that paraphrase his definitions - in particular the one relating to dissociation. Not only do those passages convey an impression of the degree of popularity of the idea, but the slightly modified definitions also offer implicit interpretations of the original hypothesis. Furthermore, some linguists use the term dissociation in the context of vocabulary, but with a slightly different meaning. This section attempts to trace the expansion of the term and the concept of dissociation in the discipline of linguistics. The fact that Das heutige Englisch is currently in its 8 th edition stresses its importance as one of the most influential books in English linguistics at German universities. This is underlined by the fact that Leisi’s book is among the modest number of twelve works that are recommended to Bavarian students studying to be teachers at Grund-, Hauptand Realschule and who wish to take their state examinations in English linguistics. 20 According to Barnickel (2000: 103), between the years 1981 and 2000, dissociation occurred among the approximately six examination questions included in finals papers in 1982, 1985, 1994 and 1995. In addition, dissociation was a topic in the spring examination of 2005 following the then new examination statutes. Even if this may not seem an impressively large number, it indicates that the concept of dissociation is so established that it can be considered part of the academic curriculum at German universities. This would seem to suggest that the term should have found a place in the relevant dictionaries of linguistic terminology. But neither Asher’s The Encylopedia of Language and Linguistics (Asher 1994) nor Bright’s The International Encyclopedia of Linguistics (Bright 1992), Crystal’s A First Dictionary of Linguistics and Phonetics (Crystal 1980) or Malmkjaer’s The Linguistics Ency- 20 See Englische Sprachwissenschaft im schriftlichen Staatsexamen (2003: 6), which has been developed by the English linguistics representatives of eight Bavarian universities. However, Leisi’s book is not among those recommended to students taking the state examination for higher education teaching. The currency of consociation and dissociation as linguistic terms and concepts 25 clopedia (Malmkjaer 1991) mention the terms consociation or dissociation. 21 One reason for this may be that all these works are written in English, which allows for the possible explanation that the concept of dissociation has not yet spread sufficiently into the non-German-speaking world to be recorded in encyclopaedias of linguistic terminology. However, even some of the German standard reference books do not contain this pair of antonyms. For instance, Bußmann’s Lexikon der Sprachwissenschaft (2002: 369) only has an entry Konsoziation, but with a meaning which is different from Leisi’s use of the term. 22 Conrad’s Lexikon sprachwissenschaftlicher Termini (Conrad 1985) does not mention dissociation either, and its entry Konsoziation refers to the same phenomenon as that described in Bußmann (2002). 23 The same is true of Glück (2000). Lewandowski’s Linguistisches Wörterbuch (Lewandowski 1973) also fails to include Dissoziation. Konsoziation is lemmatised, but the entry only contains a cross-reference to the lemmas Wortschatz ‘vocabulary’ and Bedeutungswandel ‘change of meaning’. At these entries, consociation is not mentioned any more, and no reference is made to Leisi. Burgschmidt’s Sprachwissenschaftliche Termini für Anglisten (Burgschmidt 1976) does not contain the term consociation either, but it is the first of the reference works considered to give a definition of dissociation. The only dictionary of linguistics that defines both consociation and dissociation is Herbst, Stoll and Westermayr’s (1991) Terminologie der Sprachbeschreibung. In addition, one might expect such a widely-used concept as dissociation to be the subject of linguistic articles or monographs, and indeed, the search in the MLA International Bibliography going back to 1926 24 results in several hits, 25 but most of those belonging to the domain of linguistics con- 21 This list of reference works for linguistic terminology is by no means exhaustive - exhaustiveness was not aimed at -, but it suffices to convey the general tendency. 22 Cf. Bußmann (2002 s.v. Konsoziation): “Eigenschaft sprachlicher Ausdrücke, die stets in derselben Kombination vorkommen, z.B. jahraus, jahrein, Leib und Seele”, “property of linguistic expressions which always occur in the same combination, e.g. year in, year out, body and soul” (my translation). 23 Consociation is defined as a fixed and recurrent combination of words; cf. Conrad (1985: 128): “feste Wortverbindung, die mit Regelmäßigkeit unverändert wiederkehrt: Essen und Trinken; der erste beste”. It is very likely that this definition refers back to Sperber’s (1923) definition of consociation, which is rephrased by von Lindheim - who criticises Leisi’s use of the term consociated because it is already used in semasiology (von Lindheim 1956: 273): “Man versteht darunter bestimmte, dem modernen Menschen oft wenig sinnvoll erscheinende Wortgruppierungen, die sich vielfach mit stereotyper Regelmäßigkeit einzustellen pflegen”, “by this are understood certain recurrent combinations of words which often seem to make little sense to people of our times, and which reappear with stereotypical regularity” (my translation). 24 Cf. <http: / / www.bibliothek.uni-regensburg.de/ dbinfo/ einzeln.phtml? bib_id=ub_ en&colors=255&ocolors=40&titel_id=76>, 17.11.2006). 25 The words consociation, dissociation, consociated, dissociated, Konsoziation, Dissoziation, konsoziiert and dissoziiert were searched for in both the subject guide and the title search at <http: / / web1.infotrac.galegroup.com/ itw/ infomark/ 0/ 1/ 1/ purl=rc6_ Introduction to the concepts of consociation and dissociation 26 cern a definition of dissociation that is related to aphasia. The article which sounds most promising, “Dissociation as a Form of Language Change” (Hickey 2000), reveals itself as treating a sociolinguistic phenomenon that has nothing to do with Leisi’s use of the term. Therefore, the implicit interpretations of the original definition must be retrieved from texts in which dissociation is not the key issue. Only Burgschmidt (1976) and Herbst, Stoll and Westermayr (1991) are special in that these dictionaries of linguistic terminology devote whole entries to the phenomenon. Burgschmidt (1976: 40) defines dissociation as follows: 26 Verlust der Durchschaubarkeit von Wörtern, bes. im Englischen der → hard words; Mehrmorphemigkeit, etwa bei: receive, inexplicable etc.; kann von nur- “native speakers“ nicht aufgelöst werden In the entry hard words, he proceeds to say (Burgschmidt 1976: 67) that this is a 27 Sammelbegriff für die Latinismen in der englischen Sprache, deren morphonologische Struktur für den nicht des Latein [sic] mächtigen Engländer schwierig ist; sie sind deshalb - als Wortbildungen zu lateinischen Wurzeln - für ihn nicht durchschaubar und somit trotz Mehrmorphemigkeit dissoziiert; z.B. conceptional, plenipotentiary u.a.; die Lexika des 17. und 18. Jh. waren besonders auf die ‘hard words’ konzentriert. Burgschmidt’s use of dissociation unequivocally refers to the analytical direction only. This is understandable because Leisi himself gives merely analytical examples. Nevertheless, it is unlikely that Leisi should have intended consociation to be a mere synonym of motivation, and dissociation to be its antonym. After all, a few pages after the passage quoted above (Leisi 1999: 51), he speaks of the motivation of onomatopoeic words (Leisi 1999: 55) - which implies that motivation and consociation might represent different concepts. Furthermore, the formulation “leicht mit andern Wörtern in Beziehung zu bringen, mit denen sie formal und bedeutungsmäßig verwandt sind” 28 (Leisi 1999: 51), and the requirement of integration into an MLA? sw_aep=uben>, 17.11.2006. There are several hits for dissociation etc., but none for consociation etc. 26 Burgschmidt (1976: 40) about dissociation: “Loss of transparency of words, particularly in English of the → hard words; native speakers without knowledge of other languages cannot resolve polymorphemic words such as receive, inexplicable etc.” (my translation). 27 Burgschmidt (1976: 67) about hard words: “A cover term for the Latinisms of the English language whose morphonological structure is difficult for the English speaker who does not know Latin; that is why these word-formations on Latin roots are not transparent for him and thus dissociated even though they are polymorphemic, e.g. conceptional, plenipotentiary etc.; the dictionaries in the 17 th and 18 th centuries concentrated particularly on the ‘hard words’” (my translation). 28 Leisi (1999: 51): “they can easily be brought into a relation with other words they are formally and semantically related to” (my translation). The currency of consociation and dissociation as linguistic terms and concepts 27 etymologically, phonetically and semantically related family include both an analytical and a synthetic perspective. Therefore, even if Leisi’s examples show motivation to be the dominant aspect of consociation for him, the present study will consider both perspectives but place more emphasis on the analytical direction. 29 Coming back to Burgschmidt’s analytical definition, one may add that it is mainly synchronic, as it takes the native speaker into account, 30 and language-immanent - but maybe even too much so, as one may wonder why non-Latin-speakers should be incapable of recognising the naturalised English constituents concept, -ion and -al in conceptional. By contrast, Herbst, Stoll and Westermayr’s (1991: 224) definition of dissociation does not involve any indication of directionality: 31 Erscheinung, daß im Wortschatz einer Sprache bedeutungsmäßig verwandte Wörter keine formalen Ähnlichkeiten aufweisen bzw. nicht miteinander etymologisch verwandt sind, die Beziehungen zwischen den Wörtern also nicht durchschaubar sind. The same is true of their entry for consociation (Herbst, Stoll and Westermayr 1991: 224): 32 Erscheinung, daß im Wortschatz einer Sprache bedeutungsmäßig verwandte Wörter auch formale Ähnlichkeiten aufweisen bzw. etymologisch verwandt sind, die Beziehungen zwischen den Wörtern also durchschaubar sind. Their examples for consociation in Old English intend to show that the relation between English words used to be obvious through the employment of word-formation devices such as derivational morphemes and umlaut, as is exemplified by the word family ‘go, sail’: faran, faru, faro , oferfaran, faro -r dend (cf. Herbst, Stoll and Westermayr 1991: 225). Again, there is no implication of directionality. However, directionality is implied in the adjacent passage, which explains the historical dissociation processes in English by the large influx of French and Latin words which are said to 29 This view is also in a way supported by de Saussure (1916/ 1960: 182), who explains that a word such as dix-neuf is not only relatively motivated because it can be analysed into dix and neuf, but also because of its association with other figures which are formed with its constituents, such as dix-huit and soixante-dix. See below for further interpretations of consociation as a bidirectional phenomenon. 30 However, it also introduces a diachronic perspective by stating that the dissociated words are not analysable any more. 31 Herbst, Stoll and Westermayr (1991: 224) s.v. dissociation: “The phenomenon that semantically related words in the lexicon of a language are not formally similar or etymologically related, i.e. that the relations between the words are not obvious” (my translation). 32 Herbst, Stoll and Westermayr (1991: 224) s.v. consociation: “The phenomenon that semantically related words in the lexicon of a language share formal similarities or are etymologically related, i.e. that the relations between the words are obvious” (my translation). Introduction to the concepts of consociation and dissociation 28 have remained isolated because they lack etymological support, the socalled hard words. 33 It is surprising that isolation as used here should consist of an analytical component only. Herbst, Stoll and Westermayr (1991: 225) also include garlic, sheriff and gospel as examples of phonetic dissociation because these words are no longer recognisable as the compounds g r-l ac ‘spear-leek’, sc r-ger fa ‘Shire-officer’ and g d-spell ‘good tidings’. However, the present study takes a synchronic approach, which means that the words mentioned are interpreted as monomorphemic and unmotivated, but - in view of their potential expandability 34 - not necessarily as dissociated. Lipka (1992: 8) writes that dissociation “describes the phenomenon that words are unrelated or not associated with each other”. This definition comes close to that by Herbst, Stoll and Westermayr (1991), but as it omits the words semantic and formal, it theoretically applies to a far wider range of words, e.g. cycling and duck, house and motor or yesterday and at. However, Lipka’s examples oral, tripod etc., which are taken from Leisi, point to a morphosemantic analysis reading only. Stein (2002: 131) reports that consociation existed in Old English times, “that is, word families consisted of Germanic derivatives and compounds”. With the influx of foreign words “this natural linguistic bond between words often became disrupted so that linguistic historians speak of a dissociated Modern English vocabulary” (Stein 2002: 131). Her definition mainly focuses on the historical aspect, i.e. the reason for the phenomenon. Bammesberger and Grzega (1999: 53) introduce a new component to their definition, which follows a discussion of hard words such as mouth and oral: 35 Dieses Phänomen der ererbten Wörter im Substantivbereich auf der einen Seite und der entlehnten/ gebildeten Wörtern [sic] im Adjektivbereich auf der anderen Seite bezeichnet man als morphologische Dissoziation <morphological dissociation> des englischen Wortschatzes. Der deutsche Wortschatz hingegen kann [...] als konsoziiert <consociated> bezeichnet werden. Again, stress is put on the distinction native/ foreign, but in this case also on the part of speech of the words affected. Confining dissociation to noun- 33 Cf. Herbst, Stoll and Westermayr (1991: 225): “die Übernahme französischer und lateinischer Wörter in großem Umfang, die isoliert blieben, da sie etymologisch ungestützt waren (sog. HARD WORDS)”. It is interesting to note that these authors explicitly consider French words as hard words as well, whereas this term is usually mainly used for words of Greek and Latin origin (cf. Burgschmidt 1976: 67, Görlach 1974: 105). 34 See Section 2.3. 35 Bammesberger and Grzega (1999: 53): “This phenomenon that there are native nouns on the one hand and learned loan adjectives on the other hand is called the morphological dissociation of the English vocabulary. The German vocabulary, however, can be termed consociated” (my translation). The currency of consociation and dissociation as linguistic terms and concepts 29 adjective pairs would seem to be an unnecessary restriction - for instance, Vachek (1981: 274) writes that dissociation is “the fact that lexemes of different word classes semantically belonging together (such as mouth-oral, town-urban) belong to different formal bases” -, but a few pages later, Bammesberger and Grzega (1999: 58) give a more comprehensive definition of dissociation as the phenomenon that words which immediately depend on each other semantically are not formally derivable from each other (at least not synchronically). 36 The directionality of the process is left rather vague. The same is true of Kastovsky’s (1982: 20) mention of dissociation: 37 So scheint das Deutsche insgesamt eine wesentlich stärkere Tendenz zum motivierten Wort aufzuweisen als das Englische, das eher zur Dissoziation (Leisi 1960: 57ff.), zur formalen Auflösung inhaltlich begründeter Wortfamilien neigt, vgl. father : paternal (Vater : väterlich); mouth : oral (Mund : mündlich); moon : lunar base (Mond : Mondbasis) usw. On the one hand, motivation is contrasted with dissociation, but on the other hand, the examples are arranged synthetically - in contrast to Leisi’s analytical order. Again, only noun-adjective pairs are mentioned, but part of speech is not a defining criterion for dissociation in Kastovsky (1982). Fill’s (1980: 29) definition, by contrast, makes the case for word-class change explicit again: 38 Der etymologisch gesehen gemischte Charakter des Englischen führte zu einer Erscheinung, die unter dem Namen D i s s o z i a t i o n in die Abhandlungen über das Englische eingegangen ist. Semantisch zusammengehörige Wörter verschiedener Wortarten (mouth: oral, town: urban) zeigen keine morphologische Verwandtschaft, sondern entstammen verschiedenen etymologischen Bereichen. While the derivation-like order of the examples contrasts with Leisi’s analytical arrangement, Fill (1980: 29) states that in addition to the dissociated words, there are frequently consociated transparent ones as well. 39 In spite of the preceding more general definition, this seems to imply that consocia- 36 Cf. Bammesberger and Grzega (1999: 58): “der Umstand, daß Wörter, die semantisch unmittelbar voneinander abhängen, formal nicht voneinander abzuleiten sind (zumindest nicht synchron)”. 37 Kastovsky (1982: 20): “German seems to show a far stronger tendency for lexical motivation than English, which tends more towards dissociation (Leisi 1960: 57ff), to the formal dissolution of semantically grounded word families, cf. father : paternal (Vater : väterlich); mouth : oral (Mund : mündlich); moon : lunar base (Mond : Mondbasis) etc.” (my translation). 38 Fill (1980: 29): “The etymologically mixed character of English has led to a phenomenon which has entered treatises on the English language under the name of d i s s o c i a t i o n . Words with different parts of speech belonging together semantically (mouth: oral, town: urban) are not morphologically related, but have different etymological origins” (my translation). 39 Cf. Fill (1980: 29): “da es zusätzlich zu den dissoziierten Wörtern oft ‘konsoziierte’ durchsichtige gibt”. Introduction to the concepts of consociation and dissociation 30 tion and transparency are treated as synonyms here. 40 The same reading can be found in Fill (1988: 241): as a result of this “double heritage of English” the phenomenon of dissociation is frequent (brother : fraternal, see : visible), but, first of all, there is frequently a dissociated and a consociated word (fraternal and brotherly), and, secondly, dissociation hardly ever amounts to isolation. Thus, fraternal is supported by fraternity and fraternize, visible by visibility, visualize, envisage and others. This seems to imply that dissociation in Fill’s sense is not identical with isolation but only with the lack of motivation. The passage in which Hillebrand (1975: 33-34) refers to dissocation is special in that its wording comes closer to the original than all other mentions of dissociation so far: 41 Nach Leisi liegt die durch extensive Lehnvorgänge hervorgerufene Problematik der Latinismen nicht so sehr in der Anzahl der neu hinzugekommenen Wörter oder ihren subtilen Bedeutungsunterscheidungen, sondern in der durch sie bewirkten Förderung einer schon bestehenden Entwicklung mit der Tendenz, Wörter asozial zu machen, so daß sie nicht mehr einer etymologischen (laut- und sinnverwandten) Familie angehören und stattdessen dissoziiert dastehen. 74 Die radikale Dissoziierung des englischen Lexikons findet ihren besonderen Ausdruck in der Tatsache, daß ein Substantiv häufig germanischen, das zugehörige Adjektiv jedoch romanischen Ursprungs ist [...], und in Anlehnung an F. de Saussure kann man die englische Sprache aufgrund ihrer Neigung zur Unmotiviertheit als in hohem Maße ‘lexikologisch’ bezeichnen. The last sentence of the passage also introduces the mixed origin of the English vocabulary and the change in form between nouns and adjectives - plus a reference to de Saussure. A very explicit view as far as the directionality of consociation is concerned is taken by Mayer (1962: 35): 42 40 Transparency as used in the context of Fill’s definition basically corresponds to morphosemantic motivation as defined in Section 2.2.3. 41 Hillebrand (1975: 33-34): “According to Leisi, the problem of the Latinisms, which is caused by extensive borrowing, lies not so much in the great increase in the number of words or their subtle differences in meaning, but rather in the fact that they tend to reinforce an already prevalent tendency to make words become antisocial, so that they no longer belong to an etymological (phonetically and semantically related) family but are dissociated instead. The radical dissociation of the English lexicon finds its particular expression in the fact that a noun is often of Germanic origin, while its corresponding adjective is of Romance origin […], and following F. de Saussure, it is possible to call the English language highly ‘lexicological’ because of its tendency to be opaque” (my translation). 42 Mayer (1962: 35): “Not only the derivative mündlich is consociated, but also the simplex Mund, because it is surrounded by a bundle of compounds and derivatives, such as mündlich, munden, Mündung, Mundart etc. and thus leaves the state of isolation and becomes consociated. Therefore we count words as relatively motivated which are theoretically unmotivated, but supported by their integration into a group The currency of consociation and dissociation as linguistic terms and concepts 31 Unter dem Gesichtspunkt der Konsoziation oder der Gruppenbildung ist [...] nicht nur die Ableitung mündlich konsoziiert; auch das Simplex Mund tritt dadurch, daß sich ein Bündel von Zusammensetzungen und Ableitungen um es lagert (mündlich, munden, Mündung, Mundart usw.), aus der Vereinzelung heraus und wird konsoziiert. Wir möchten daher [...] relative Motivation auch solchen Wörtern zusprechen, die, obwohl theoretisch unmotiviert, durch ihre Stellung in einem Gruppenverband Stützung erfahren. Wenn wir von unmotivierten Wörtern sprechen, so meinen wir damit nur Wörter, die wirklich jeglichen Gruppenzusammenhang [sic] entbehren und “asozial” dastehen. Mayer’s interpretation of Leisi’s definition, which sees consociation as a bidirectional phenomenon involving both analysability and expandability, coincides with the one adopted here, and with that of Barnickel (2000: 29), who accepts consociation in a synthetic sense when stating that question is consociated in questionable. However, in contrast to Mayer (1962), the present study does not view consociation and motivation as synonymous because the second term is taken to refer to the analytical direction only. Complete agreement with the quotation translated above would require replacing relatively motivated in line three by consociated and unmotivated in line four by dissociated, so that words to which neither analysability nor expandability apply would be categorised as dissociated and not as unmotivated - particularly because the quotation “antisocial” comes from Leisi’s definition of dissociation (Leisi 1999: 51). Another discussion which also sees consociation as a bidirectional phenomenon can be found in Görlach (1974: 108), who, under the heading of dissociation, speaks of words that are etymologically isolated within the system, i.e. not recognisable as derivations or as the core of existing derivations. 43 Derivation might be understood in a relatively broad meaning here; cf. another passage from the same book (Görlach 1974: 103): 44 Wörter sind konsoziiert, wenn sie in einer durchsichtigen Ableitungsbeziehung miteinander in einer Wortfamilie vereint sind; Dissoziation ist die Isolierung des Einzelworts. Konsoziation ist die Regel in Sprachen mit hoher Frequenz an Komposita und Ableitungen (Ae., Dt., Lat.). Dissoziation tritt auf durch die of words. If we speak of unmotivated words, we only mean words which belong to no group at all and are ‘antisocial’” (my translation). 43 Cf. Görlach (1974: 108): “Wörter, die im System etymologisch isoliert sind, d. h. nicht als Ableitungen oder als Kern vorhandener Ableitungen zu erkennen sind”. 44 Görlach (1974: 103): “Words are consociated if they are united in a word family by a transparent derivational relation. Dissociation is the isolation of the individual word. Consociation is the rule in languages with a high frequency of compounds and derivatives (Old English, German, Latin). Dissociation is caused by the incorporation of loan words or by the obscuring of derivational relations. […] Dissociation is eliminated by new derivations, which was the case particularly in the Middle English period: Old English weorc : wyrcan : wyrhta; Middle English + , + . The fact that English is a language with widespread dissociation stands in contrast to its relatively regular inflection” (my translation). Introduction to the concepts of consociation and dissociation 32 Aufnahme von Lehnwörtern (11.) oder durch die Verdunklung der Ableitungsbeziehung (7.5). […] Dissoziation wird, besonders in me. Zeit, durch Neuableitungen beseitigt: ae. weorc : wyrcan : wyrhta; me. + , + . Die Tatsache, daß das Engl. eine Sprache mit ausgedehnter Dissoziation ist, steht im Gegensatz zu den weitgehenden Regularisierungen in der Flexion. Görlach states explicitly that word formations on the basis of a particular word - strictly speaking derivations at least - can have a consociating function that compensates for its loss of motivation. Scheler (1977: 119) also disagrees with Fill (1988: 241) by explicitly equating dissociation with isolation. 45 However, he brings in yet another new aspect when he claims that due to the fact that dissociated words such as rhinoceros or appendicitis have no or few lexical relatives in the English lexicon, they get hardly any motivational support (Scheler 1977: 110). 46 This formulation turns Leisi’s yes-no-distinction between word-family members and completely isolated words into a matter of degree. Scheler thus creates a situation in which a word that is integrated into a group of morphosemantically related words can still be classified as dissociated; a view that is rejected in the present study. Gelfert’s (2003: 75) treatment of the dichotomy is reminiscent of Scheler’s: 47 Der aus mehreren Sprachen zusammengesetzte Wortschatz führt nicht nur zur Qual der Wahl, sondern auch dazu, dass die einzelnen Wörter weniger miteinander vernetzt sind. Sie sind, wie Philologen sagen, nicht so fest <konsoziiert> wie im Deutschen, sondern <dissoziiert>. This phrasing seems to imply that Gelfert also sees consociation as a gradable phenomenon. In such a model, words are consociated to different degrees, and dissociation corresponds to a low level. In principle, the idea that consociation increases with the number of connections of a word to other words is an acceptable one. However, due to the open-ended character of a living language, which makes it impossible to analyse a closed and fixed set of words, the present study will confine itself to stating whether a word is consociated without making any strong claims about the strength of the phenomenon. 48 Dissociation, by contrast, is defined as the complete lack of morphosemantic connections to other words. 45 Scheler (1977: 119): “D i s s o z i a t i o n ( I s o l a t i o n ) ”. 46 Cf. Scheler (1977: 110): “Sie [rhinoceros, appendicitis etc.] stehen, mit keinen oder nur mit wenigen lexikalischen Verwandten, im englischen Wortschatz so gut wie ungestützt da: sie sind dissoziiert.” 47 Gelfert (2003: 75): “The vocabulary, which is composed from several languages, does not only spoil the speaker for choice, but also has the effect that the individual words are less closely connected with each other. They are, as philologists say, not as strongly <consociated> as in German, but <dissociated>” (my translation). 48 See also Section 2.3.1. The currency of consociation and dissociation as linguistic terms and concepts 33 In addition to the passages which translate or paraphrase Leisi’s definition, there are authors who do not mention the Swiss scholar as their source, but who nevertheless use dissociation as a linguistic term in the field of lexicology, with a meaning that is close to Leisi’s. Hughes (1988: 229), for example, writes: The process of dissociation, the borrowing of opaque foreign terms, started soon after the Conquest and has been steadily extended, often for motives of pretence and obfuscation. In recent years Latin doctor, which supplanted Saxon leech centuries ago, is being replaced (particularly in the US) by Greek physician. Like Leisi, Hughes mentions the foreign language element - which, as his following examples show, refers to Latin and Greek -, as well as to the fact that the borrowed terms are not motivated. However, the non-integration into word families, which is an integral part of the original theory, is omitted. The most extensive treatment of the terms consociation and dissociation outside Leisi’s work can be found in Finkenstaedt and Wolff’s Ordered Profusion: Studies in Dictionaries and the English Lexicon (1973). Here, the original dichotomy is complemented by the term isolation (Finkenstaedt and Wolff 1973: 161): An item is called isolated if there is no etymologically related word in the English lexicon and if there are no compounds or derivatives, e.g. bungalow. The vast majority of words imported from exotic languages are isolated; their introduction is motivated by extra-linguistic forces. Their occurrence in texts is frequently an indicator of a specific “foreign” quality about the text. The speed and degree of adaptation depend on the quality of the things denoted. Most of the isolated words are nouns. An item in the lexicon is called consociated if it is etymologically related to another item in an evident way. Thus father and paternal are not consociated because their relationship via an Indo-European *p t e - ’ r is only known to the philologist, who cannot be taken as the representative “average speaker”. Words like consist, resist, subsist would be recognized as belonging together even without any knowledge of Latin. […] An item in the lexicon is called dissociated if it has a semantic but no etymological relation with another item. The words father and paternal are dissociated. Even though the parallels to Leisi are not only of a terminological, but also of a conceptual nature, no reference to his definition is made in the context of the quotation above or in the bibliography of the book. However, it seems reasonable to assume that at the time of publication of Ordered Profusion, Finkenstaedt and Wolff were familiar with Leisi’s 1955 dichotomy. After all, they had cooperated with Leisi on A Chronological English Dictionary three years before (Finkenstaedt, Leisi and Wolff 1970). Furthermore, in his review of Ordered Profusion, Käsmann (1975: 477) states that the division of lexical items in terms of isolation, consociation and dissociation is nothing new. Nevertheless, Finkenstaedt and Wolff’s terminology features a new Introduction to the concepts of consociation and dissociation 34 subdivision of Leisi’s “antisocial” words into those which are completely isolated and those which have semantically related relatives that are formally dissimilar. Their definition also explicitly states that both compounding and derivation are factors integrating words into families, while this has to be guessed from the different types of word-formation in Leisi’s examples, such as the derivative mündlich or the compound Dreifuß. Finkenstaedt and Wolff’s classic example of isolation are the so-called “exotic words”, i.e. the words borrowed from languages which are not among the quantitatively most important languages for the English lexicon, namely Old French, Latin, Common Germanic, Old Norse and Low German (Finkenstaedt and Wolff 1973: 57). Of course, the prototypical exotic words come from outside Europe, but the above definition leads to the paradoxical situation that Modern French and Italian loans would be termed exotic even if they only represent more recent stages of Latin. According to Finkenstaedt and Wolff (1973: 58), Words of this type are usually “isolated” words in the receiving language; they are neither members of a semantic field, nor are they self-explanatory from the etymological point of view: they are neither “consociated” nor “dissociated” etymologically. Finkenstaedt and Wolff also create the unfortunate situation that words can be both consociated and dissociated at the same time. The following passage about 17 th -century Greco-Latin neologisms explains why (Finkenstaedt and Wolff 1973: 63): they are to a high degree etymologically related to other words already in the language, they are “consociated”. At the same time they are semantically related to English (Germanic) words and therefore “dissociated” from the etymological point of view. For the word carnivorous, this would mean that even though it is at least morphosemantically related to carnivore and the adjectival suffix -ous, it is classified as dissociated because the English language contains the words eat and meat. Such an approach is not compatible with the present study. It poses the problem that non-native words could never be unequivocally classified as consociated because any semantically related Germanic word may be taken to dissociate a large Romance word family, and there would always be the danger of having overlooked such a word. This situation contradicts the idea of the “average speaker”, who is mentioned in the definition of dissociation. 49 49 In this aspect, they are close to Leisi (1985: 246), who defines dissociation in English as the lack of semantic-formal relations to known words and/ or to easy English words - which implies both a speaker for whom this is the case as well as an intra-linguistic perspective. Cf. Leisi (1985: 246): “drittens sind sie dissoziiert (nicht mit bekannten Wörtern semantisch-formell ‘vergesellschaftet’): während deutsch Flußpferd mit Fluß Conclusion 35 Furthermore, Finkenstaedt and Wolff’s definition of dissociation does not consider the integration into a word family, but only the relation to a particular word. The same is true of consociation. The present study, however, does not aim to compare a word with another individual word but with the whole lexicon in order to determine its position with regard to the complete society of words. 1.4 Conclusion Leisi’s terms are widely received in Germany, but less well-known in the non-German-speaking world. This may be partly explained by the fact that they have their origins in the contrastive study of German and English. However, their possible application to other languages such as French and Italian is mentioned in Das heutige Englisch (Leisi 1999: 55), so that greater currency may have been expected. Quite a large number of authors paraphrase Leisi’s definition of dissociation - but there are only few instances of them doing so with the term consociation. It is interesting to note that none of the works mentioned actually quotes the original text, which means that each reference to the concept may bring in a new implicit interpretation: some authors, such as Burgschmidt, only mention the analytical direction, while others, such as Mayer and Görlach, stress the bidirectionality of the phenomenon. Semantic relatedness and formal dissimilarity are the only defining criteria for Bammesberger and Grzega, while Herbst, Stoll and Westermayr accept formal or etymological relatedness, and Finkenstaedt and Wolff even stress the etymological aspect. Burgschmidt introduces a postulated average native speaker with a synchronic and language-immanent view. Part of speech change is a criterion for Bammesberger and Grzega, Fill, Vachek and implicitly also for Kastovsky. Scheler and Gelfert introduce the idea of degree by classifying words as dissociated even if there are some - but only few - morphosemantically related words. The Finkenstaedt and Wolff definition allows for words to be simultaneously consociated and dissociated and seems to view these phenomena as relations between a target word and individual other words 50 rather than considering a word’s global (non-)integration into a word family. It also involves a representative average speaker, whose judgements determine whether words are consociated by being evidently related. In conclusion, dissociation is interpreted in many slightly different ways - but always as a phenomenon in which words are less connected to other und Pferd assoziierbar ist, ist hippopotamus mit keinem einfachen englischen Wort konsoziiert.” 50 In the sense that one can say that word X is consociated with word Y, but dissociated from word Z. Introduction to the concepts of consociation and dissociation 36 words than expected. What goes beyond this common denominator is subject to variation. As such terminological diversity calls for clarification, the next chapter is devoted to the definition of dissociation as used in the present study, but also to the definition and explanation of the most important theoretical concepts underlying the research project. 2 Terminology The preceding chapter has shown that Leisi’s definitions of consociation and dissociation are open to interpretation and have been paraphrased with slight differences by a variety of authors. Accordingly, the understanding of these terms in the present study - which has only been mentioned in passing so far - must be explained carefully before Leisi’s hypotheses can be tested empirically. In addition, the following sections will provide definitions of the most important terminology used here. 2.1 Consociation and dissociation The implicit interpretations of dissociation 51 presented in Chapter 1 partly disagree on the following points: formal similarity semantic relatedness analytical aspect synthetic aspect etymological origin part of speech gradability possible simultaneity of consociation and dissociation relation to particular words judgement of a representative average speaker A definition that serves as the basis for empirical testing has to take all these factors into account. In the present study, a word is consociated if it is integrated into a synchronic, language-immanent, morphosemantically related word family. This may involve either the analytical or the synthetic direction or both. As for the analytical direction, a word is consociated if it is motivatable. 52 The synthetic criterion is fulfilled if a word is expandable. 53 Consociation is not primarily considered as a word’s relation to a particular other word - which could be paraphrased as “word X is consociated by word Y” -, but as a relation to the whole lexicon of the language in question: “X is consociated”. 54 Words can therefore be either consociated or dissociated, but not both at the same time. Consociation may be more or less pronounced, de- 51 Consociation is rarely explicitly mentioned, but can usually be defined ex negativo. 52 See Section 2.2. 53 See Section 2.3. 54 In this context, a relation to a particular other word can serve as the basis for a statement of the kind “X is conscociated at least by means of word Y”. Terminology 38 pending on the word’s degree of motivatability and expandability. However, the present study will confine itself to detecting consociation as such. Etymological origin and part of speech are not defining criteria, but they may be an influencing factor, and their effect will be considered separately. A word is dissociated if it is not integrated into a synchronic, languageimmanent, morphosemantically related word family either in the analytical or in the synthetic direction. Dissociated words are thus neither motivatable 55 nor expandable 56 . Dissociation is not considered as a word’s relation to a particular other word - “X is dissociated by Y”, which would make no sense within the approach followed here -, but as a relation to the whole lexicon of the language in question: “X is dissociated”. Words can therefore be either consociated or dissociated, but not both at the same time. In contrast to consociation, dissociation is an all-or-none phenomenon. It only applies if words are completely morphosemantically isolated within a particular language. Etymological origin and part of speech are not defining criteria, but they may be an influencing factor, and their effect will be considered separately. 2.2 Motivation and motivatability Motivation, or rather, motivatability, one of the two possible criteria for establishing consociation, has received a great deal of attention in linguistic literature - interestingly, much more so from linguists with a German or French than with an English-speaking background. Even if it is generally acknowledged in modern linguistics that the linguistic sign is essentially arbitrary, 57 there are exceptions to this rule: if a word is motivated, there is an understandable relation between its form and its meaning. Consequently, motivation and arbitrariness are antonymous concepts. However, motivation is necessarily motivation for someone (Gauger 1970: 105), i.e. motivation cannot be postulated independently of the mind of the language user. As it is doubtful to what degree the kind of motivation that can be arrived at through reflection is intersubjective and actually involved in subconscious lexical processing in the mental lexicon, motivation as used here must always be understood as meaning ‘potential motivation’. This is also pointed out by Rettig (1981: 76): 58 55 See Section 2.2. 56 See Section 2.3. 57 Coseriu (1968: 81-83) states that current linguistics seems to believe that Saussure was the first to formulate the theory of the arbitrariness of the linguistic sign, even though Saussure himself never claimed to be saying anything new in this respect. For a discussion of de Saussure’s predecessors, see Coseriu (1968). 58 Rettig (1981: 76): “It cannot be the task of the linguist to recognise present motivational relations objectively because there are no such objective relations within the lexicon. The linguist’s task is rather to research and theoretically describe the whole Motivation and motivatability 39 Aufgabe des Linguisten kann es dann nicht sein, objektiv vorhandene Motivationsbeziehungen festzustellen, weil es solche objektiven Beziehungen innerhalb des Lexikons nicht gibt. Seine Aufgabe ist es vielmehr, den gesamten Umfang dessen zu erforschen und in seiner Theorie darzustellen, was mögliche Motivationsbeziehungen und was tatsächlich historisch realisierte Motivationsbeziehungen im metasprachlichen Denken, in der metasprachlichen Rede und im Sprachwissen der Sprecher sind. He therefore prefers to speak not of motivation, but rather of motivatability 59 - an approach which is also adopted in the present study because the term is more precise. However, where reference is made to linguistic literature etc., the term motivation may be used instead. Table 1: Models of motivation Ullmann (1962) phonetic motivation primary onomatopoeia (imitating sounds) secondary onomatopoeia (movements or qualities) morphological motivation semantic motivation Bally (1965) motivation by the signified motivation by the signifier phonetic orthographic implicit motivation motivation by a zero-morpheme Lu (1998) morphological motivation semantic motivation phonemic motivation graphemic motivation Zöfgen (2002) synchronic external phonetic motivation internal morphosemantic motivation semantic motivation semasiological onomasiological diachronic etymological motivation wrong morphological popular etymology In spite of what has been said above, motivatability is not a unified concept. Different researchers have suggested different models with various subtypes of motivation, a selection of which is summarised in Table 1. The following sections will take these subdivisions, which generally differentiate between formal, morphological and semantic aspects, into account. range of possible motivational relations and actual historically realised motivational relations in metalingual thought, in metalingual discourse and in the linguistic knowledge of the speakers” (my translation). 59 Cf. Rettig (1981: 75-76): “dann sollte man von Lexikoneinheiten auch nicht sagen, daß sie ‘motiviert’ sind, sondern daß sie ‘motivierbar’ sind.” Terminology 40 However, the overview of the different types of motivation will not follow one particular model. Instead, it will provide an eclectic summary that serves as the basis for the working-definition of motivatability adopted in the present study. A more detailed account of how the principles outlined here are applied to the actual analyses will be given in Chapter 3. 2.2.1 Phonetic-semantic motivation Ontogenetically and phylogenetically, it makes sense to start with what is considered to be the most basic type of motivation, namely that present in onomatopoeia. Even though onomatopoeic words do not reproduce their referent exactly - which is due to the limitations of the human articulatory organs -, they imitate it sufficiently for its identification (Tournier 1985: 53). Ullmann (1962: 84) distinguishes primary onomatopoeia referring to sounds, such as crack or plop and secondary onomatopoeia, which represent movements or qualities, such as gloom or slimy. The degree to which phonetic-semantic motivation is possible varies from one language to another because syllable structures differ (Wandruszka 1969: 23). Wandruszka (1969: 25) assumes that phonetic-semantic motivation is higher in English than in German. Still, it must not be rated too highly: according to de Saussure (1916/ 1983: 69), onomatopoeia are “rather marginal phenomena”, “never organic elements of a linguistic system”, and there are “far fewer than is generally believed”. Furthermore, phonetic-semantic motivation is generally only possible post festum (Wandruszka 1969: 13): 60 Eine Lautgestalt empfinden wir nur da als bedeutungsvoll, wo der Sinn uns dazu einlädt - niemand wird die phonetische Motivation eines Wortes wie Ticktack bestreiten, niemand wird sie in einem Wort wie Taktik suchen. Similarly, nobody would want to consider the noun ring ‘piece of jewellery’ to be phonetically-semantically motivated because of its onomatopoeic verbal homonym ring that refers to the activity of a bell (Ullmann 1962: 87). Ullmann (1962: 86) notes that “a strong element of conventionality enters into most onomatopoeic formations” - which can be exemplified by the fact that the sound of the rooster is Kikeriki in German, but quiquiriquí in Spanish and cocorico in French; not to mention the English variant cock-adoodle-doo. Therefore, one may agree with Ricken’s (1983: 111) assumption that speakers learn onomatopoeia as units of form and meaning. This explains why it is usually not possible to guess the meaning of onomatopoeia in an unknown language (cf. the study reported in Sauvageot 1964: 180- 181). Ricken notes that onomatopoeia which are integrated into the lan- 60 Wandruszka (1969: 13): “We only interpret a phonetic form as meaningful if the meaning of the word invites us to do so - no-one will deny the phonetic motivation of German Ticktack ‘tick-tock’, but no-one will look for it in a word such as Taktik ‘tactic’” (my translation). Motivation and motivatability 41 guage system, such as French siffler ‘whistle’ and ronronner ‘purr’ or their German equivalents pfeifen and schnurren, are often no longer felt to be phonetically-semantically motivated (Ricken 1983: 112). By contrast, some linguists do not only consider whole words to be phonetic-semantically motivated, but even particular sounds or groups of sounds: Firth (1964: 184) coined the term phonaestheme for the analogical association of certain phonemes or clusters with certain meanings or meaning categories. Thus, can mean ‘slippery’, ‘slimy’, etc., the -words grate, grunt and grind represent harsh sounds, and words with an initial often suggest an impact; compare plod, plop and plonk (Adams 2001: 121). As a conclusion, “the more subtle and more interesting cases [of onomatopoeia] will often be a matter of personal opinion” (Ullmann 1962: 89). This high degree of subjectivity - particularly in words that do not refer to sounds - leads to the exclusion of phonetic-semantic motivatability from the present study; even more so because both consociation and dissociation as they are understood here deal with morphosemantically related word families only. 61 2.2.2 Orthographic-semantic motivation Bally (1965: 133) mentions a parallel to onomatopoeia in the written language: it may occur that the orthographic form of a word directly evokes its content, such as in French œil ‘eye’, whose shape - particularly the first letter combination - is reminiscent of an eyeball, or in the case of locomotive, in which one may perceive a chimney and wheels. However, even if he thinks that this phenomenon should not be neglected, he believes that most of these associations are puerile. As it can be assumed that instances of orthographic-semantic motivation are quite rare 62 and even more subject to individual preferences than phonetic-semantic motivation, they will not be dealt with any further. 2.2.3 Morphosemantic motivation The most important subcategory of motivation for the present study is the one that Ullmann (1962: 91) calls morphological motivation. 63 According to Bollée (1997/ 1998: 37 and 1995/ 96: 16), however, Ricken’s (1983: 111) term morphosemantic motivation is preferable. 64 She also believes that while state- 61 However, see Section 2.2.3.1 for a broadening of the meaning of morphosemantic. 62 Otherwise the phenomenon might be mentioned more frequently in the literature. 63 The example words used in the following sections’ theoretical discussions may not necessarily belong to the words analysed in the present study. By contrast, the example words used in Chapter 3 are all among the 5,000 analysis items. 64 This is also the reason why the two previous types of motivation have been referred to as phonetic-semantic and orthographic-semantic, and not only as phonetic and orthographic. The distinction will become important in Section 2.2.3.1, where apparent Terminology 42 ments about phonetic-semantic motivation are subjective, morphosemantic motivation is objective and can be described by the observer (Bollée 1997/ 1998: 40-1). Even though it will become clear in the following chapters that morphosemantic motivation is not without its difficulties either, one can in principle agree with her. As has already been mentioned in Chapter 1, this type of motivation is defined by de Saussure (1916/ 1974: 131-132): Some signs are absolutely arbitrary; in others we note, not its complete absence, but the presence of degrees of arbitrariness: the sign may be relatively motivated. For instance, both vingt ‘twenty’ and dix-neuf ‘nineteen’ are unmotivated in French, but not in the same degree, for dix-neuf suggests its own terms and other terms associated with it (e.g. dix ‘ten,’ neuf ‘nine,’ vingt-neuf ‘twenty-nine,’ dixhuit ‘eighteen,’ soixante-dix ‘seventy,’ etc.). […] The notion of relative motivation implies: (1) analysis of a given term, hence a syntagmatic relation; and (2) the summoning of one or more terms, hence an associative relation. While it is accepted here that morphosemantic motivation implies an analysis, the present use of the term will not include an associative relation as well because that would lead to a major overlap with the definition of consociation in Section 2.1. In this respect, the general consensus of modern linguistics that dissociates itself from Saussure’s associative aspect of motivation is followed: compare Herbst, Stoll and Westermayr’s (1991: 23-24) and Bußmann’s (2002: 452) definitions of the term motivation in Chapter 1. Not only do these definitions deal with the directionality of the phenomenon, but they also mention a second important aspect of morphosemantic motivation, namely the compositionality of meaning. This principle is also important for Gauger (1971: 8), who recognises the morphosemantic constituents Garten ‘garden’, agentive -er and Haus ‘house’ in German Gärtner ‘gardener’ and Gartenhaus ‘garden shed’. 65 Lu (1998: 36) believes that anyone who knows the meaning of the morphemes tea and spoon will also understand the compound teaspoon. However, it must not be forgotten that this requires a great deal of extralinguistic knowledge: instead of being ‘a small spoon that you use for mixing sugar into tea and coffee’, 66 a teaspoon could morphological analysability regardless of semantic aspects, i.e. transparency, will be distinguished from morphosemantic motivation. 65 He calls them “durchsichtige Wörter” ‘transparent words’ because their formalsemantic make-up allows the speakers to look through them, to understand their structure and to explain them in this way (cf. Gauger 1971: 8: “weil ihre formalinhaltliche Beschaffenheit es den Sprechenden erlaubt, durch sie hindurch zu sehen, sie gleichsam zu ‘durchschauen’ und sie - eben dadurch - zu erklären”). 66 See the Longman Dictionary of Contemporary English, abbreviated to LDOCE in the following, s.v. teaspoon. Motivation and motivatability 43 also be a spoon made of tea, or a spoon on which the word tea has been engraved. This observation fits with a remark by Seppänen (1978: 138): 67 Jedes Nominalkompositum, das einen gemeinschaftlich etablierten Wert bezeichnet, ist trotz seiner scheinbaren Analysierbarkeit zwangsläufig ein weitgehend arbiträres Sprachzeichen. Thus rat-poison has the conventional meaning ‘poison for the extermination of rats’, while snake-poison usually designates ‘poison produced by a snake’ (Hoeksema 2000: 853). Gauger (1971: 148) considers German Briefmarke ‘stamp’ a motivated word because it is possible to recognise its constituents Brief ‘letter’ and Marke ‘tag’. However, he criticises that the word is misleading because stamps can also be used for parcels, and because the decisive criterion, namely the function of enabling an object to be transported by the post, is missing. This criticism can be attenuated by Schippan (1974: 218), who notes that even though Sitzung ‘conference’ is motivated by sitzen ‘to sit’, it is characterised by more essential features than those signalled by its constituents. She concludes that the meaning derived from the constituents is not necessarily identical with the meaning of the lexeme (Schippan 1974: 222). This line of argument is also supported by Gauger (1970: 99). 68 According to him, motivation is not identical with absolute causation or justification; it only means that the meaning of a word contains a motive for its phonological shape. The same point is made by Laca (1986: 65), who distinguishes Wortbildungsbedeutung as the meaning of a word that can be justified on the basis of a language’s system of rules, and the Wortschatzbedeutung, which is a fixed and commonly used variant of the Wortbildungsbedeutung. Whatever the case, the viewpoint taken here would not want to consider a word such as Sitzung completely unmotivatable, as it is possible to include its free constituent in a meaningful paraphrase, such as ‘a meeting during which people typically sit down to discuss important matters’. The 67 Seppänen (1978: 138): “Every nominal compound that designates a commonly established value is to a large extent an arbitrary linguistic sign in spite of its apparent analysability” (my translation). 68 Cf. Gauger (1970: 99): “Der Begriff der Motivation meint keine restlose Verursachung oder Rechtfertigung; er meint nur, daß im Inhalt bzw. im Intentum ein M o t i v für die Form der Lautung enthalten sei.” Gauger (1971: 133) distinguishes three different types of motivation, which will be explained in this order: ausgreifend ‘reaching out’, verschiebend ‘shifting’ and variierend ‘varying’. Thus, French pommier ‘apple tree’ can be related to pomme ‘apple’. The complex word representing the first type of motivation and the base word designate different entities. By contrast, tendre ‘tender’ and tendresse ‘tenderness’ have the same meaning except for the fact that they differ in their part of speech. Here, the transformation is of an intralinguistic kind. The third type of motivation applies to pairs such as maison ‘house’ and maisonnette ‘small house’, where the complex word only represents a specific variation of the base. Terminology 44 fact that this extension of meaning as compared to the original definition 69 includes an irrelevant aspect may be no problem if Béjoint (1999: 84) is right in saying that compounds never seem to be satisfactory on the formal level: Total transparency may not exist for compound nouns that have only two or three elements, because the elements can only represent part of the semantics of the whole. For example, the form plastic bag describes the material (“a bag which is made of plastic”), but this is not enough to define the concept, which must include the function (“a bag made of plastic which is used by shops …”). To this can be added Hansen’s (1978: 247-248) observation which is based on the assumption that compounds show a certain affinity with syntactic units: he believes that ambiguity can arise particularly if the verb in the compound is not expressed, but only implied. Thus, in noun-noun compounds such as honeybee, the verb “has to be deduced from the semantic roles that the nominal constituents can have in relation to each other” (Hansen 1978: 249). In the present example, this could be either eat or produce. “Which of the two interpretations actually applies can, in this case, only be decided by recourse to our factual knowledge of the denotatum” (Hansen 1978: 249). Usually, the constituents of a word imply that there is a determining relationship, but they do not reveal what kind of relationship (Gauger 1971: 154-155). 70 Levi (1978: 76-77) mentions the following relations between the constituents of compounds: cause, e.g. in tear gas have, e.g. in lemon peel make, e.g. in silkworm use, e.g. in steam iron be, e.g. in soldier ant in, e.g. in field mouse for, e.g. in horse doctor from, e.g. in olive oil about, e.g. in abortion vote. The important role that world knowledge plays in the recognition of a word’s motivation must be complemented by the importance of context in the understanding of compounds. Heringer (1984: 46, 50) gives the following example: German Fischfrau is typically understood as ‘woman selling fish’ if there is no context, but it might as well mean ‘the fish’s wife’, ‘mermaid’, ‘fish-eating woman’, ‘woman descending from fish’, ‘woman who 69 Compare the Duden Universalwörterbuch s.v. Sitzung: “Versammlung, Zusammenkunft einer Vereinigung, eines Gremiums o. Ä., bei der über etw. beraten wird, Beschlüsse gefasst werden”, “meeting or gathering of an association, a committee etc. in which an issue is discussed and/ or decisions are taken” (my translation). 70 Cf. Gauger (1971: 154-155): “Das Wort selbst sagt uns also nur, d a ß hier ein Determinationsverhältnis vorliegt, nicht aber welcher Art dieses ist”. Motivation and motivatability 45 has brought the fish’, ‘woman who has shouted fish’, ‘woman whose sign of the zodiac is pisces’, 71 ‘woman standing near the fish’, ‘woman looking like a fish’ etc. If the speakers do not know each other, they will rely on their common cultural knowledge and interpret Fischfrau as a woman selling fish. The situational knowledge in the context “Februar, aha, also Fischfrau! ”, 72 however, or the episodic knowledge that the speaker has a weakness for esoteric practices may lead to a different reading, namely the zodiac option (Heringer 1984: 48-50). The variety of alternative readings that are theoretically possible for a complex word leads Heringer (1984: 51) to the following conclusion: 73 Nicht alles muß ausgedrückt sein. Ja, alles kann gar nicht ausgedrückt sein. Das meiste muß entgegen kleinmütigen Annahmen sowieso implizit bleiben. Deshalb sollten wir die Verstehensfähigkeiten des Menschen nicht unterschätzen. Unsere Deutungskompetenz arbeitet andauernd und überall. In Tropfsteinhöhlen sehen wir Menschenstatuen, Feen, Dome, Kreuze, Fenster; in Rorschachklecksen entdecken wir kämpfende Monster, drohende Augen, liebliche Grotten. Genauso können wir neue und kreative Wortbildungen deuten, ja wir müssen sie deuten, weil sich ein Verständnis einfach einstellt. [...] Genauigkeit ist relativ, und nicht überall ist Genauigkeit am Platz. [...] Einziges und oberstes Prinzip der Wortbildung muß Verstehbarkeit sein: Alles, was sinnvoll ist, ist möglich. Summing up, in many or even most complex words, neither all important semantic features nor the relation between the constituents are unequivocally expressed; complex words are imprecise by nature. However, working out their meaning generally poses no problem to the speakers, who are assisted by the complex words’ conventionality, by the linguistic and extralinguistic context 74 and by their own world knowledge. Consequently, once it is possible to include a complex word’s constituents in a sensible 71 But actually, this meaning is more likely to be expressed by the compound Fischefrau. Internet search of both terms in combination with Horoskop ‘horoscope’ yielded 1,430 hits for Fischefrau, but only 692 for Fischfrau (<http: / / www.google.de>, 30.10.2006). 72 This context can be translated as “February - oh, then you/ she must be a …”. 73 Heringer (1984: 51): “Not everything needs to be expressed - actually, it is even impossible to express everything. In contrast to pusillanimous assumptions, the largest part has to remain implicit anyway. That is why we should not underestimate the human ability to understand. Our interpretative competence is active everywhere and at all times. In stalactites and stalagmites, we recognise human statues, fairies, cathedrals, crosses and windows; in the ink blots of a Rorschach test, we discover fighting monsters, threatening eyes or pleasant grottoes. In the same way, we can interpret new and creative word formations - we even have to because an understanding of their meaning simply sets in. […] Precision is relative and it is not always called for. […] The only and most important principle of word formation has to be understandability. Everything that makes sense is possible” (my translation). 74 Cf. Bauer (2000: 837): “Using a complex word in context […] is usually sufficient to specify which of its multitude of possible meanings is the intended one.” Terminology 46 and satisfactory paraphrase of its meaning, the word in question will be considered to be fully morphosemantically motivatable in the present study, even if different interpretations are possible. Thus, whereas Käge (1980: 20) considers German Geburtstag ‘birthday’, Eisbrecher ‘icebreaker’ and Haustür ‘front door’ to be only partially motivated because in general, Geburtstag does not refer to the day of one’s birth, but to its yearly commemorative celebration, because Eisbrecher is first above all a ship, and because Haustür is not any one of the doors of a house, but only the one leading into it, these words are fully motivatable in the present model. After all, the components of these compounds are in an understandable relation to the meaning of the whole: the birthday is ‘the day on which one’s birth is celebrated’, an icebreaker is ‘a ship that breaks the ice in order to pass through’, and a “house-door” is ‘the door leading into a house’. A well-known English example is given by Lipka (1992: 3): although jail and prison are synonyms, their derivatives have different meanings. A jailer is a person who is in charge of a prison, and a prisoner is a person who is kept in a prison. “From the point of view of the system of the language, it could equally well be the other way round, namely that the jailer was inside and the prisoner outside the prison” (Lipka 1992: 3). This is due to the fact that even though one can consider both words to be fully motivated, there is a remainder of indeterminacy because “the suffix only denotes someone who has something to do with what is denoted by the derivative base.” These examples show very clearly that full motivation is not equivalent to absolute motivation. 75 De Saussure (1916/ 1974: 132) has two good reasons for using the term relative motivation when dealing with morphosemantically motivated words: even in the most favorable cases motivation is never absolute. Not only are the elements of a motivated sign themselves unmotivated (cf. dix and neuf in dixneuf), but the value of the whole term is never equal to the sum of the value of the parts. Teach + er is not equal to teach X er […]. The fact that dix-neuf means ‘19’, i.e. ‘10 + 9’, and not ‘90’, i.e. ‘10 x 9’, is arbitrary and conventional (Hoeksema 2000: 851). As a consequence, all morphosemantic motivation must of necessity be relative. So, if the term full motivatability is used in the remainder of this text in order to designate the highest degree of relative motivatability, this is merely for reasons of simplicity of expression and in order to avoid calling this phenomenon full relative morphosemantic motivatability. 75 In fact, absolute motivation in the strictest sense of the word does not seem to exist because it would require a word to be universally understandable, which is not even the case for onomatopoeia (cf. Section 2.2.1). Motivation and motivatability 47 2.2.3.1 Partial motivatability and transparency Unfortunately for researchers, the existence of motivatability is seldom a yes-no question. 76 De Saussure (1916/ 1974: 132), who says that “motivation varies, being always proportional to the ease of syntagmatic analysis and the obviousness of the meaning of the subunits present”, had already noticed this. As a consequence, it is necessary to recognise an intermediate category between full motivatability and complete arbitrariness. Following the terminology of Gauger (1971: 137), all intermediate categories will be referred to with the attribute partial; hence the name partial motivatability for the category mentioned above. 77 Even if such instances are marked in the study in order to distinguish them from full motivatability and the lack of motivatability, partial motivatability will count as a legitimate consociating factor in the final results. For the formal side, this can be justified by two facts: firstly, the first morpheme in Leisi’s (1999: 51) own example mündlich deviates from the free word Mund in the first vowel’s umlaut. Secondly, de Saussure (1916/ 1974: 134) writes that “Latin inim cus recalls inand am cus and is motivated by them” - in spite of the fact that the complex word is inim cus and not *inam cus. Neither of the two fathers of dissociation, then, seems to expect motivating elements to be formally absolutely identical with the segments in the corresponding word. In addition, slight differences between the meaning of constituents and the meaning of complex words are also tolerated by Leisi when he claims that Flusspferd ‘hippopotamus’ is motivated by Fluss ‘river’ and Pferd ‘horse’ (Leisi 1999: 55). Therefore, partial motivatability can be accepted as a valid consociating factor. To rephrase de Saussure’s (1916/ 1974: 132) above statement in a slightly different way, motivational analyses may be faced with two kinds of obstacles, namely those concerning the form and those concerning the meaning of a word - and of course, possible combinations of the two. Formal difficulties can in turn be split up into those affecting the spoken form and those affecting the written form of a word. A close look at Das heutige Englisch (Leisi 1999: 51) conveys the impression that Leisi stresses the phonetic aspect in his definitions; consider the fragments “keine Verwandtschaftsbeziehung, die zugleich Laut und Bedeutung einschließt” ‘no relation to other words which simultaneously includes their sound and their meaning’ and “gehören keiner etymologischen (laut- und sinnverwandten) Familie an” ‘do not belong to an etymological (phonetically and semanti- 76 Compare Lu (1998: 49). 77 In analogy to what was said about full motivatability in the preceding chapter, partial motivatability is shorthand for relative partial morphosemantic motivatability. According to Fleischer and Barz (1995: 18), motivational scales with three degrees are widely used - with varying terminology they are used among others by Käge (1980: 12-13), Augst, Bauer and Stein (1977: 8) and Ulrich (1972: 285). Terminology 48 cally related) family’. His example tripe for a phonetically related word emphasises this aspect, too, as the final <e> in this word graphically dissociates it from the written word tripod - at least very slightly. This may be attributed to the fact that spoken language precedes written language both ontogenetically and phylogenetically. In the passage that relates dissociation to first language acquisition (Leisi 1999: 55), the signifiant is also repeatedly characterised as lautlich ‘phonetic’. Nevertheless, one must not overlook the first part of Leisi’s definitions, which includes the passages “stehen formal nicht allein” ‘do not stand alone formally’ and “formal und bedeutungsmäßig verwandt” ‘formally and semantically related’ (Leisi 1999: 51; my emphasis). This more comprehensive phrasing is also adopted by Herbst, Stoll and Westermayr (1991: 224), whose definition of dissociation involves formally similar or etymologically related words, and by Bammesberger and Grzega (1999: 58), who use the expression “formally derivable”. None of the definitions quoted in Chapter 1 refers to the spoken language only - instead, the wording gets around the problem. Since Leisi’s lautlich thus offers the possibility of being interpreted as shorthand for ‘formal with stress on the phonetic side’, it makes sense to include both the spoken and the written variety in the research: after all, first language learners acquire a large part of their advanced vocabulary through reading, and second language learners frequently encounter words in their written form for the first time or at least soon afterwards. So, even if consociation and dissociation are interpreted as involving both the spoken and the written form of words, the marking of any differences in these modalities will maintain the option of considering one aspect only in separate calculations. However, before the different formal aspects involved in partial motivatability can be discussed, it is necessary to introduce the concept of transparency. The present work will basically follow the distinction set up in Ronneberger-Sibold (2002: 105-106): 78 Als transparent oder durchsichtig wird im Folgenden ein morphologisch komplexes Wort bezeichnet, dem man eine wörtliche Bedeutung 2 [also called Wortbildungsbedeutung or systematische Bedeutung] zuordnen kann. Nach dieser Definition sind also z.B. sowohl Blaubeere als auch Junggeselle transparente Wörter, denn man kann ihnen die wörtlichen Bedeutungen ‘Blaue Beere’ bzw. ‘Junger 78 Ronneberger-Sibold (2002: 105-106): “In the following, a morphologically complex word will be called transparent if it is possible to assign a literal meaning to it. According to this definition, both Blaubeere ‘blueberry’ and Junggeselle ‘bachelor’ are transparent words because it is possible to assign the literal meanings ‘B LAUE B EERE ’ ‘blue berry’ and ‘J UNGER G ESELLE ’ ‘young journeyman’ to them respectively. Words will be counted as motivated if their literal meaning fits with their referential meaning. In this sense, Blaubeere ‘blueberry’ is motivated because blueberries are actually blue berries. Junggeselle ‘unmarried man’, however, is not motivated (any more) because not every and not even the prototypical bachelor is a young journeyman” (my translation). Motivation and motivatability 49 Geselle’ zuordnen. Als motiviert sollen solche transparenten Wörter dann gelten, wenn ihre wörtliche Bedeutung zu ihrer referentiellen Bedeutung 3 [also called Lexikonbedeutung or Gebrauchsbedeutung] paßt. In diesem Sinne ist Blaubeere motiviert, denn Blaubeeren sind tatsächlich blaue Beeren, während Junggeselle ‘unverheirateter Mann’ nicht (mehr) motiviert ist, da nicht jeder, und nicht einmal der prototypische unverheiratete Mann ein junger Geselle ist. At this point, a few terminological aspects need to be clarified because the usage of the terms transparent and motivated is not standardised in any way: some authors restrict themselves to one of the two terms, e.g. Hausmann (2002a and 2002b), who employs transparency only, and Zöfgen (2002), who prefers motivation; other authors use both words synonymously, e.g. Shaw (1979: 11) and Käge (1980: 7). An argument in favour of calling words whose constituents are formally related to the complex form transparent is the fact that the term itself is relatively transparent in a metaphorical way. By contrast, the term motivated may evoke unwanted associations of a diachronic kind, and in addition, its frequent use in psychology and didactics, which has entered the general vocabulary, 79 also activates a strong rival meaning. Nevertheless, these problems should be overcome by a definition stating for instance that the term relates to the synchronic level only (cf. Section 2.2.8). One may also argue that the potentiality expressed in the term motivatability, which is used in the present study, makes this a more suitable option than motivation, so that the unwanted associations are no longer evoked. Instead of contrasting motivatability with transparency, it would certainly have been possible to use a term such as Zöfgen’s (2002: 189) falsche Motiviertheit ‘wrong motivation’, or von Polenz’s (1967: 79) irreführende Motivierung ‘misleading motivation’ for formally but not semantically similar words. However, the differences between lexical relations that involve a semantic component in addition to formal similarity and those that do not - or not necessarily - are not as obviously reflected in the terminology as when terms formed from different roots are used. Moreover, the repeated use of a clear but extremely long term such as pseudo-motivatability would make the text slightly awkward to read. Apart from this increase in reader-friendliness, there are some additional arguments in favour of the distinction taken over from Ronneberger-Sibold: firstly, there do not seem to be any linguistic works that use the terms motivation and transparency in a way that is exactly reverse, secondly, this procedure can also be interpreted as a tribute to German philology, which has provided many stimulating ideas for the present study, 80 and thirdly, moti- 79 Compare the meanings of motivation recorded in LDOCE. 80 Note that both Bußmann’s Lexikon der Sprachwissenschaft (2002) and the Metzler Lexikon Sprache (Glück 2000), which can be considered important reference works in German linguistics, define motivation in the sense used here while omitting the term transparency completely. Terminology 50 vation is used here in the modified de Saussure definition. Therefore, the term transparency will be used in the following to refer to the formal analysability of complex words into existing word-formation elements in spite of semantic unrelatedness or if only the formal aspects are considered. In order for a word to be fully transparent from a phonetic point of view, its constituents have to be completely integrated into the longer form. This pertains to both the individual sounds and the word accent. If a word fails to meet any of these conditions, it is only partially phonetically transparent. So measurement is fully phonetically analysable into measure and -ment, and in spite of minor orthographic differences, daily is fully motivatable by day and -ly. Similarity and cigarette, however, are instances of partial phonetic transparency only because the vowels in the constituents similar and cigar are realised slightly differently, and because the accent is not on the same syllable as in the complex word, as can be seen in the transcriptions and , for example. In analogy to what was said about phonetic analyses, words are only fully orthographically transparent if the chains of letters of the constituents can be completely integrated into the complex word. So publisher is fully orthographically transparent with respect to publish and -er. The fact that electricity and collection are pronounced slightly differently from electric and collect does not prevent these words from attaining the highest level of orthographic transparency. By contrast, removal and worried are only partially orthographically transparent because the final letters of their constituents remove and worry are omitted and changed respectively. The examples treated so far are all fully semantically motivatable and only involve either phonetic or orthographic difficulties. As there is thus full transparency in either of the two varieties, such words may still be counted as fully motivatable. Many problematic cases, though, involve a combination of phonetic and orthographic difficulties. Consider the following words, which are quite evidently semantically related to the subsequently listed elements: demonstrat|ion - demonstrate, capabil|ity - capable and explan|ation - explain share enough phonemes and graphemes to be readily recognised as belonging together. If a word is fully motivatable from a semantic point of view, but only partially formally transparent due to simultaneous phonetic and orthographic deviations, it is labelled as partially motivatable. This can be justified by the fact that the consociating link is less strong than in cases of perfect coincidence. However, this raises the question how heat - hot, patience - patient, plaintiff - complain, pride - proud and proof - prove, which pose even larger formal difficulties, should be treated. From a purely formal point of view, it is not possible here to speak of a complex word which is analysable into simpler constituents. On the other hand, these word pairs are so obviously related semantically, and the similarities in pronunciation and spelling are so evident, that one Motivation and motivatability 51 would not really want to call these words alone or antisocial (as in Leisi 1999: 51). Therefore, they are considered fully motivatable semantically, but only partially formally transparent; a view which is supported by Augst (1975: 27-31): he found that non-linguists do not segment series such as Gebot - Verbot ‘commandment - prohibition’ and gelingen - misslingen ‘succeed - fail’, but that they put these words next to each other unanalysed. Gefieder ‘plumage’ and Feder ‘feather’ are also placed side by side in spite of the language user’s inability to describe the formal change by a rule. A similar statement with respect to English can be found in Das heutige Englisch (Leisi 1999: 53): conceive - conception, perceive - perception and receive - reception are quoted as a case in point for Franco-Latin word families within the English language. However, on the same page, these very examples make Leisi lament the frequent absence of the Romance base words - from which the whole family is derived - in the English language. From this, it can be concluded that it is possible to speak of word families in Leisi’s sense even in cases where the base word is missing, e.g. in the pairs exclude - include and sustain - maintain. But there are even more problematic cases from a formal point of view: even though it is possible to argue that tale and tell share so few phonemes and graphemes that the first word can be hardly considered partially motivatable by the second one, their semantic relatedness constitutes a very important criterion. Furthermore, tale and tell are very short and not completely formally unrelated such as oral - mouth. In its written form, tale shares half its letters in a particular position with the motivating tell; in the spoken form, the proportion is either identical or even higher, depending on the treatment of the diphthong. Therefore, tale can be counted among the partially motivatable words as well. Yet this does not mean that motivatability is recognised in all semantically related word pairs exhibiting a minimal formal similarity. For instance, gehen and Gang are too distant in spite of their shared initial because all the remaining linguistic material is different. 81 Another question that still needs to be settled is that of directionality. So far, it has been stated without any explanation that pride is motivatable by proud. As it is impossible to confirm this decision on the basis of morphological complexity, one must ask whether it would not be possible to argue for the opposite direction as well. A similar problem can be observed in the case of so-called zero-derivations, e.g. the noun balance and the verb to balance. Gauger (1971: 141) believes that when the subconscious compares verbs and nouns, it instinctively makes a decision which goes into one of 81 Similarly, if legal is stripped of its suffix, the similarity to law is only minor, as merely the initial sound and letter are shared. However, the complex word still contains a motivating suffix. Terminology 52 both directions in most cases. 82 Transferred to the problem mentioned above, this means that even if ‘the feeling of pride’ is a conceivable definition for proud, it appears that proud is the more basic concept. For instance, the learners’ dictionary LDOCE, which aims at simplicity and userfriendliness, defines pride as ‘a feeling that you are proud of something that you or someone connected with you has achieved’, while the definition of proud does not involve the word pride. One could assume that this intuitive principle is also valid for other word classes. Where a particular directionality suggests itself, one partner of such a semantically related word pair is classified as partially motivatable, and the other one as partially expandable: 83 thus pride is partially motivatable by proud, and proud is partially expandable into pride. Where both directions are plausible, e.g. in the case of the zero-derived noun and verb a smile and to smile, both directions are valid: the noun can be motivated by the verb, and the verb can be motivated by the noun. 84 However, this kind of motivating relationship is only admitted if no smaller constituents can be discerned. Where the common formal base of semantically related complex words is not a free lexeme in English, the treatment is analogous, so that include and exclude can both be partially motivated by each other. At first sight, this procedure may appear surprising, but it is very much in accord with the ideas of Bybee (1995: 429), according to whom morphological relations in the mental lexicon are constituted by “parallel sets of phonological and semantic connections”. 85 As far as semantics is concerned, the present approach is synchronic, so that it is not possible to rely on established reference works such as etymological dictionaries in order to determine whether a word is made up of particular constituents that are plausible with respect to their meaning. Therefore, the guiding principle is Augst’s synchronic etymological competence (Augst 1998: IX-X), which is defined as 82 Cf. Gauger (1971: 141): “Unwillkürlich trifft das Bewußtsein bei der Gegenüberstellung von Verb und Substantiv fast immer eine in diese oder in jene Richtung gehende Entscheidung.” 83 Compare Section 2.3.2 for the discussion of partial expandability. 84 In cases such as the above, the motivating word is not admitted as a partial expansion as well. 85 For more information on consociation/ dissociation and the mental lexicon, see Section 6.1. Motivation and motivatability 53 eine Fähigkeit des “normalen” Sprachteilhabers, komplexe Wörter zu zerlegen, bis er (bei Ableitungen) bei einem nicht mehr zerlegbaren Kernwort ankommt. [...] Wichtig ist dabei, dass wir nicht das Wissen von Fachleuten berücksichtigen; der “normale” Sprachteilhaber wird als “Laie” verstanden, der nicht auf besondere fachliche, fremdsprachliche oder gar sprachwissenschaftliche Kenntnisse zurückgreifen kann. 86 This approach appears to be highly compatible with Leisi and Finkenstaedt and Wolff, 87 and in spite of its drawbacks, 88 it may allow conclusions for the mental lexicon. 89 Being completely synchronic, it admits all plausible solutions, regardless of whether they are etymologically correct or not. The results of the analyses correspond to a reflecting language user (cf. Gauger 1970: 104) and aim to discover a maximum of motivatability within these limitations. 90 Even though a very strong semantic link between each constituent and the meaning of the complex item is the basic requirement for motivatability, there is still enough scope for different semantic phenomena that render the categorisation more complex: 86 Augst (1998: IX-X): “an ability of the ‘normal’ language users to analyse complex words until they arrive at core words which cannot be analysed any further (in the case of derivatives). […] In this respect, it is important that we do not consider experts’ knowledge; the ‘normal’ language user is understood as a layperson who cannot rely on particular knowledge of subject fields, foreign languages or even linguistics” (my translation). 87 Cf. their requirement that consociated items should be related to other items in an evident way for the representative average speaker (Finkenstaedt and Wolff 1973: 161). 88 For instance, the normal language user is an abstraction that does not exist (cf. Ickler 1999: 298) - just as Chomsky’s (1965: 3) ideal speaker (cf. the famous quotation according to which “Linguistic theory is concerned primarily with an ideal speakerlistener, in a completely homogeneous speech-community, who knows its language perfectly and is unaffected by such grammatically irrelevant conditions as memory limitations, distractions, shifts of attention and interest, and errors (random or characteristic) in applying his knowledge of the language in actual performance”). The researcher can therefore only attempt to carry out analyses that come close to this postulated ideal. 89 Cf. Marslen-Wilson et al. (1994: 26-27): “The average listener has no access to the diachronic history of a word and will only mentally represent it as morphologically complex if this gives the right compositional semantics. Any linguistic analysis of the morphology of English must, therefore, be filtered through this synchronic criterion before it can be interpreted in terms of actual mental representations of words in the language.” 90 According to Sauer (1982: 468), the naive attitude towards language supports opacity, while the reflecting attitude aims at motivation. Terminology 54 1. Even though all the constituents of a complex word are semantically related to the complex item, a vital constituent can be missing, e.g. in German zugleich ‘at the same time’, which can be analysed into the paraphrase-suitable constituents zu ‘at’ and gleich ‘same’, yet nothing reveals that the centre of interest is time. 91 Similarly, Schneider ‘tailor’ is related to the verb schneiden ‘cut’ and the agentive suffix -er because cutting cloths and threads is certainly part of a tailor’s job, but this is not central enough to derive the word’s Wortschatzbedeutung from its Wortbildungsbedeutung. 92 2. The polysemy or even homonymy of a word’s constituents may render the search for a relation to the complex word’s meaning more complicated. For instance, the first element in the lexeme Rentenmarkt ‘bond market’ is ambiguous because Rente can mean both ‘retirement’ and ‘bond’. 93 3. The referent of exocentric compounds such as Blauhelm ‘UN soldier’ - literally ‘blue helmet’ - cannot be deduced from the determinatum because the relation between the constituents is hard to determine without a certain degree of world knowledge. However, the same is actually true of most if not all complex items. Thus, a bedroom could also be ‘a room in the shape of a bed’, and the word network could be misinterpreted to mean ‘a type of work that is carried out in a net’. While it is possible to draw the conclusion that complete motivatability is never possible, the meanings of all the constituents mentioned above can be related to their corresponding complex word’s meaning. The question that the present study tries to answer is neither “What could/ would/ should the complex word’s meaning be if the meaning of its constituents is taken as the basis? ” nor “Is it possible to arrive at this word’s meaning if only the meanings of its constituents are known? ”, but rather “Are the meanings of all the constituents directly reflected in the complex word’s meaning? ”. This approach is not without its difficulties 91 Note, though, that one lexical unit of gleich can also have the temporal meaning ‘immediately’. 92 Compare Section 2.2.3. While the Wortschatzbedeutung of Schneider is ‘tailor’, the Wortbildungsbedeutung would rather be paraphrased as ‘cutter’. Consequently, even if the verb and the suffix -er are known, a person ignoring the derivative’s meaning could hardly ever arrive at the usual reading. The same is true of the English word income, whose constituents are metaphorically related to the meaning of the complex item, but the monetary aspect is missing. 93 In this particular case, the analysis is further complicated by the fact that the meaning that is relevant for the compound is the second, less frequent one. Affixes can also have a large range of different meanings (cf. Motsch 1999: 151-152) - which is the case here, too. According to Laca (1986: 170), agentive and instrumental fall together in one suffix not only in English and German - in the case of -er -, but also in other languages. Motivation and motivatability 55 either, but taking the existent material as the starting point that has to be accounted for in some way puts a certain limit on the number of alternative solutions, while the other questions formulated above would increase the degree of speculation. Therefore, it can be stated once more that from a purely semantic point of view, full motivatability applies by definition to any complex word whose constituents are all directly related to the meaning of the complex word in a plausible way. 94 However, deciding whether a paraphrase is sensible is not always an easy task: is proportion motivatable by portion, as in ‘a portion considered in relation to the whole’? Can interview be analysed into interand view because it is ‘an exchange of views’? Is the noun account related to the verb to count because ‘the money on an account is counted’? Does responsibility involve ‘having to respond for what happens’? And are prototypical butchers big and strong - to put it differently, butch? 95 Sporadic questioning revealed disagreement on the aforementioned examples. However, all these words have formally obvious constituents that are to some degree related to the whole word’s meaning - which is why one would not want to consider them completely unmotivatable - but not obviously enough to categorise them as fully motivatable either. As a compromise, they are all considered partially motivatable on semantic grounds. In general, words with no motivating constituents are interpreted as monomorphemic because there is no reason why one should postulate constituents if the semantic cause is missing. 96 However, there may be words that are not motivatable, but still transparent. For example, a ladybird is neither a lady nor a bird, nor is there any immediately conceivable connection between the three words. Still, Ullmann (1962: 91) states that in spite of the obscured relations, “it is none the less obvious that such words are morphologically motivated.” Lipka (1992: 74-79), too, recognises that there are complex lexemes whose constituents are not meaningful morphemes, but rather what he calls formatives, i.e. pseudo-morphemes. This category applies to complex words, such as understand, phrasal and prepositional verbs, such as hold up and put up with, and idioms, such as to pull someone’s leg or to blow a raspberry. Of these, only the complex words are of relevance to the present study. Even if the lack in semantic compatibility excludes such items from the group of consociated words, they are marked in order to find out about their proportion within the vocabulary. 94 Due to the vague nature of semantic boundaries, this feature is more error-prone than the others. 95 Most of the examples mentioned are even etymologically correct. However, considering an etymological dictionary in doubtful cases is not an option here because it would exert a diachronic influence on the synchronic study. 96 Even though this does sometimes occur in popular etymology, e.g. in the case of mushroom, which has nothing to do with either mush or room (Shorter Oxford English Dictionary s.v. mushroom). Terminology 56 A related category between full morphosemantic motivatability and its complete absence is described by Ronneberger-Sibold (1997: 1783): In a few cases, such as [German] letal ‘lethal’, where the adjective was borrowed directly from Latin, but not the corresponding noun (letum ‘death’), such an adjective does not have a synchronically independent derivational base. Due to the parallelism with all the other adjectives in -al, language users will nevertheless be able to divide it into -al and a base let-, which they happen not to know. Kandler and Winter (1992: 35) also split off meaningful elements, even if unknown segments remain. Accordingly, words consisting of one part that is fully or partially morphosemantically motivatable and another part which is not will be considered partially motivatable. In addition, a distinction is made between free and bound motivating elements. Thus, words can not only be related to each other through their bases, but also through their affixes. 97 For example, repeat contains the prefix re- ‘again’, just as reconsider, but is not motivatable by its base, and Leisi’s example oral also contains the Latin-based element oras well as the recurrent, to some degree motivating, suffix -al. In classifying such words - e.g. German reparieren ‘repair’ and renovieren ‘renovate’- as partially motivatable by means of their prefixes and suffixes, von Polenz’s model (1967: 78) has been adhered to. This is kept apart from cases such as yesterday, which can obviously be motivated by the free morpheme day, but whose element *yester is not a synchronically existent English word. Similarly, the unique morphemes cran-, raspand Himin cranberry, raspberry and Himbeere ‘raspberry’ may be assumed to express what distinguishes the different berries from each other. This hypothesis is supported by Gauger (1971: 23), who believes that language consciousness need not necessarily be able to assign a clear meaning to a sequence of sounds before recognising it as a morpheme, but that a strong feeling of dealing with a morpheme is of crucial importance here. 98 97 Affix is used in the present study as a term comprising not only prefixes and suffixes but also what are sometimes called combining forms (see Marchand 1960: 87 and Tournier 1985: 88 for criticism of that term and its inconsistent application). 98 Cf. Gauger (1971: 23): “Zur Entscheidung der Frage, ob eine gewisse Anzahl zusammenhängender Laute einen Inhalt habe, ob sie also ein ‘Morphem’ bilde, ist es nicht vonnöten, daß das Sprachbewußtsein in der Lage ist, jenen Inhalt mehr oder weniger eindeutig [...] anzugeben; ausschlaggebend ist vielmehr der dem Bewußtsein sich unwillkürlich aufdrängende Eindruck, d a ß eine Lautreihe etwas bedeutet; ein Eindruck, der uns erst dazu anhält und es uns erst ermöglicht, nach einem ‘Inhalt’ zu fragen.” Motivation and motivatability 57 2.2.4 Semantic motivation Stephen Ullmann (1962: 91-92) introduces yet another type of motivation, which is based on semantic factors and therefore called semantic motivation: 99 When we speak of the bonnet or the hood of a car, of a coat of paint, or of potatoes cooked in their jackets, these expressions are motivated by the similarity between the garments and the objects referred to. In the same way, when we say the cloth for the clergy […] or ‘town and gown’ for ‘town and University’, there is semantic motivation due to the fact that the garments in question are closely associated with the persons they designate. Both types of expression are figurative: the former are metaphorical, based on some similarity between the two elements, the latter are metonymic, founded on some external connexion. The present study will not devote particular attention to this phenomenon because it is believed that the figurative employment of a word does not cause major problems, as it merely involves an extension of the word’s meaning but no change in form. The newly created lexical unit 100 is still part of the original lexeme, and so the question of formal-semantic wordfamily integration does not pose itself. However, figurative language use does play a role in morphosemantic motivatability and expandability in that the constituents of particular words may be related to the complex word by the intermediate step of metaphor or metonymy: thus, jacket potato could be classified as motivatable because jacket is a very obvious metaphor for peel. Nevertheless a certain interpretative effort is involved, and it is therefore preferable to mark the word as partially motivatable due to semantic restrictions. By contrast, the skin in potato skin is interpreted as a dead metaphor derived from the skin of humans and animals, which has become a full lexical unit of the lexeme skin. This view is supported by the fact that LDOCE includes the fruit and vegetable meaning of skin, but not that of jacket. 101 99 Reasoning along the same lines, Käge (1980: 6) prefers the term figurative motivation. 100 Cruse (1986: 80) defines the lexical unit as follows: “a lexeme is a family of lexical units; a lexical unit is the union of a single sense with a lexical form; a lexical form is an abstraction from a set of word forms (or alternatively - it is a family of word forms) which differ only in respect of inflections.” Thus, the nominal lexeme dream comprises at least the two lexical units ‘a series of thoughts, images, and feelings that you experience when you are asleep’ and ‘a wish to do, be, or have something - used especially when this seems unlikely’ (cf. LDOCE s.v. dream). Each of these lexical units comprises the word forms dream, dreams, dream’s and dreams’. 101 Even if the occurrence or non-occurrence of a particular metaphoric or metonymic lexical unit in a particular word’s dictionary entry is questionable as well (cf. Herbst and Klotz 2003: 39-48) because it is based on the lexicographers’ subjective decisions, it is adopted as a criterion bringing in the intuition of researchers other than the analyst of the present study. Terminology 58 2.2.5 Etymological motivation So far, motivation has been considered from a purely synchronic perspective - but of course, the contemporary vocabulary items are also interrelated from a diachronic point of view in the so-called etymological motivation (Zöfgen 2002: 189-190): even if they may be unmotivatable now, their original form may have been motivatable in the past. For example, Hughes (1988: 3) notes that the roots of salt “have spread and grown into the diverse forms of salary, salad, sauce, saucer, sausage, silt and the verb to souse”. While this list is characterised by a certain formal similarity, combined with a quasi-absence of semantic relation, Pei (1962: 2) categorises water, whiskey, hydrant and vodka as a family grouping, which is more or less semantically related, but formally heterogeneous. Of course, the normal language user, who is in the focus of the analyses undertaken here, cannot be expected to link such disparate words. Consequently, former stages of the words under consideration are completely disregarded in the analyses of motivatability, so that the approach can be characterised as contemporary and synchronic. 102 2.2.6 Motivation by foreign elements The treatment of foreign words in the analyses is also an important aspect. Von Polenz (1967: 77) makes the following point for the German language: 103 102 For the separate treatment of historical aspects consult Section 3.5. 103 Von Polenz (1967: 77): “Many loan words remain unmotivated, i.e. they do not stand in a word formation relationship within the German language, and they are not derivable, e.g. Problem ‘problem’, Balkon ‘balcony’, Hobby ‘hobby’. However, as soon as loan words and particularly loan word stems become productive in word formation, their derivatives also become motivated within the German vocabulary - even though sociolinguistic differences make themselves strongly felt. For the normal language user, loan word formations such as Auto(mobil) ‘car’, Autogramm ‘autograph’ and Automat ‘(vending) machine’ are unmotivated words […]. For someone who has a certain knowledge of academic vocabulary or of the vocabulary of particular professions and subject fields, however, they are fully or partially motivated. Not so much because that person has learned Greek - which not all educated people and specialists have - but because in the vocabulary mastered by that person, which is enlarged in a certain direction, a series of other words with the same first constituent stands next to them, e.g. Autobiographie, autochton [sic], Autodidakt, Autograph, Autokratie, Autonomie, Autopsie, Autotypie etc., i.e. words for whose second constituents a semantic link within his own vocabulary can be found […]. For such language users, the element auto is a smallest element of meaning, […] not because they can diachronically derive it from Greek, but because this element structures their vocabulary into groups” (my translation). Motivation and motivatability 59 Viele Lehnwörter bleiben unmotiviert, d. h. sie stehen innerhalb des Deutschen in keiner Wortbildungsbeziehung, sind nicht >ableitbar<: z. B. Problem, Balkon, Hobby. Sobald aber Lehnwörter und vor allem Lehnwortstämme wortbildungsmäßig produktiv werden, erhalten ihre Ableitungen auch innerhalb des deutschen Wortschatzes eine Motivierung, wobei sich aber die sprachsoziologischen Unterschiede stark auswirken. Für den normalen Sprachteilhaber sind z.B. Lehnwortbildungen wie Auto(mobil), Autogramm und Automat unmotivierte Wörter [...]. Für den, der am akademischen Bildungswortschatz oder am Fachwortschatz bestimmter Berufe und Sachgebiete teilhat, sind sie dagegen voll oder teilweise motiviert. Nicht so sehr, weil er Griechisch gelernt hat - das haben nicht alle Gebildeten und Fachleute -, sondern weil in seinem in bestimmter Richtung erweiterten Sprachbesitz noch eine Reihe weiterer Wörter mit dem gleichen ersten Bestandteil danebensteht: z. B. Autobiographie, autochton [sic], Autodidakt, Autograph, Autokratie, Autonomie, Autopsie, Autotypie usw., Wörter, für deren zweite Bestandteile er ebenfalls in vielen Fällen semantische Querverbindungen innerhalb des eigenen Wortschatzes findet [...]. Das Element auto ist für solche Sprachteilhaber ein kleinstes Sinnelement, [...] nicht weil sie es diachronisch vom Griechischen ableiten können, sondern weil dieses Element in ihrem Wortschatz strukturell gruppenbildend wirkt. All in all, this statement can be accepted, except that it probably takes no intellectual to know the words autonomy and autodidact. Autobiography is a very common word as well - the lemma occurs 496 times in the British National Corpus -, and once the difference to biography is known, the normal language user may even arrive at the meaning of the element auto-, which recurs in other words. Summing up, many words with a foreign origin are integrated into the borrowing language 104 to such a degree that it is possible to postulate that the normal language user is capable of assigning a meaning to them on the basis of their equally well-integrated constituents. Therefore, the most frequently recurring foreign elements are treated just like native elements in the analyses of motivatability and expandability. The distinction between more and less integrated foreign elements is of course difficult to draw, but in case of doubt, it is always possible to check whether a particular foreign element is contained in the learners’ dictionaries. 105 For instance, both the English and the German dictionaries contain the combining form auto-, but none contains the more specialised lipo- ‘fat, lipid’ 106 . 104 Compare Crystal’s (1995: 126) witty remark, though: “When one language takes lexemes from another, the new items are usually called loan words or borrowings - though neither term is really appropriate, as the receiving language does not give them back.” 105 See Section 3.4.1.1. 106 Cf. the Shorter Oxford English Dictionary, abbreviated to SOED in the following, s.v. lipo-. Terminology 60 2.2.7 Interlingual motivation In this context, the relation of words in one particular language to elements in other languages deserves mention as well. Due to the common Germanic origin of English and German, many translation equivalents in these languages show a high degree of formal similarity, e.g. apple - Apfel, end - Ende or blind - blind. Hausmann (2002a: 258) speaks of interlingual transparency when it is possible to understand a word in a foreign language by virtue of its formal similarity with a semantically related word in another language. This means that the examples mentioned above would be termed partially interlingually motivatable in the present terminology - blind even fully interlingually motivatable. In addition, it is also possible for words to be transparent both on an intraand an interlingual level (Hausmann 2002b: 449), e.g. in the case of buttermilk - Buttermilch. The phenomenon of interlingual transparency is particularly important in the case of the so-called internationalisms, i.e. words generally consisting of Greco-Latin constituents and having a similar form and meaning in several European languages (Braun 1979: 96-97). Compare the following examples: Table 2: English, German, French and Spanish internationalisms English German French Spanish definition Definition définition definición horizon Horizont horizon horizonte information Information information información intelligent intelligent intelligent inteligente million Million million millón national national national nacional plan Plan plan plan president Präsident président presidente Interlingual transparency is certainly an important factor in the learning of foreign languages that are as closely related as English and German because learners can profit from a similarity between their native language and the foreign language. Nevertheless, the fact that this phenomenon exceeds the boundaries of one language 107 and is furthermore noncompositional inevitably results in its omission from the present study. 107 Compare Leisi’s (1985: 246) and Augst’s (1998: IX-X) definitions of synchronic etymological competence, which both exclude interlingual lexical relations. Expandability 61 2.2.8 Motivatability in the present study Taking into account everything that has been said about motivation and motivatability in the previous sections, the following conclusions about the analytical aspect of consociation can be drawn: 1. It is synchronic. 2. It is language-immanent. 3. It is (usually) morphosemantic. 4. It is based on the (assumed) normal language user’s judgements. 5. It is based on plausible paraphrases of the complex word that involve the meaning of the formally similar or identical constituents. 6. It may be full or partial. 2.3 Expandability Expandability is the aspect of consociation complementary to motivatability. The term expandability refers to the fact that it is possible to find a semantically related complex word for a particular analysis item in which the search word is contained: for example, the noun discipline can be expanded into the adjective disciplined and into the compound self-discipline. In a sense, expandability for a whole word is what productivity is for an affix. 108 However, there are two major differences: firstly, although the possibility of creating a new word on the basis of a source word is theoretically enough to prove a word’s expandability, the working definition will only admit words attested in the contemporary language. Otherwise possible but non-existent forms may reduce the actual degree of dissociation, e.g. tasaying, which is a hypothetical compound formed on the basis of the interjection ta ‘thank you’, but which is not established. 109 In spite of its theoretical expandability, ta would therefore be classified as unexpandable in the framework of the present study because there are no longer words which contain it. Secondly, the viewpoint adopted here leads to the implication that words can be expanded into longer words that are part of the contem- 108 Cf. Bußmann (2002: 537) s.v. Produktivität: “Fähigkeit von Wortbildungselementen zur Neubildung sprachlicher Ausdrücke”, i.e. productivity is the ability of word formation elements to form new linguistic expressions. A related concept is that of word formation activity formulated by Fleischer and Barz (1995: 60): “W o r t b i l d u n g s a k t i v i t ä t ist die Eigenschaft/ Fähigkeit von Lexemen, als Basis von Derivaten und/ oder Konstituente von Komposita zu dienen”, “word formation activity is the ability of lexemes to serve as the basis of derivatives and/ or a constituent of compounds” (my translation). 109 Quirk et al. (1985: 1522) speak of “potential English words” when a word is possible with respect to the rules of word formation - e.g. *unexcellent -, but is not established. Cf. also Coseriu’s (1978: 44) distinction between system and norm. Terminology 62 porary word-stock, even if the mechanism underlying their creation is no longer productive. Thus warm can be expanded into warmth, even if no new words are formed with the suffix -th any more (Bauer 1983: 48-50). Like motivatability, expandability is considered to be purely synchronic and language-immanent, and it relies on the normal language user’s judgements. However, while it is necessary to define which of the various subtypes of motivation should be admitted as belonging to the analytical direction of consociation, expandability is by definition restricted to the morphological aspect. 110 Instead, the underlying word formation processes play a more prominent role. 2.3.1 Qualitative measurement of expandability In the approach adopted here, one expansion is considered sufficient proof for an item to be consociated. The expansion chosen and entered in the word list is the most suitable and representative candidate found. 111 Accordingly, the approach adopted here can be described as measuring qualitative rather than quantitative expandability. Of course it would also be conceivable to measure the degree of expandability and ultimately the degree of consociation by counting all compounds and derivatives containing the target word as a base. However, such an approach is quite problematic for several reasons: 1. A quantitative approach aiming at completeness is impossible by definition. For instance, neologisms 112 uttered in unrecorded conversations are impossible to retrieve. As the available sources only cover a minute fraction of everything that is said or written in a particular language, even within a delimited period of time, an attempt at considering quantitative results is very imprecise. 2. The quantitative approach being extremely time-consuming, at least if semantics is to play an important role, the number of analysis items would have to be drastically reduced - which would lead to less representative results. 3. A quantitative approach needs to include qualitative aspects as well, e.g. in order to determine which borderline cases should enter the word count. Consequently, it suffers from the combined problems of both approaches. 110 Again, there may be exceptions analogous to those described in Section 2.2.3.1. 111 It is possible that several expansions are equally suitable, so that a choice has to be made. 112 Even though it is possible to exclude extremely rare words from a quantitative analysis, there may be many words on the route towards institutionalisation which are not recorded in reference works but should nonetheless be counted - an impossible task if they are not present in the minute samples at the disposal of the researchers. Expandability 63 While a quantitative approach tries to cover all possible expansions of a word, the qualitative approach concentrates on the core, i.e. the relatively obvious best expansions, and leaves out the periphery with its problematic fuzzy boundaries. The degree of expandability is measured on the basis of the best representative’s correspondence to the criteria laid out in the following. Furthermore, the strength of expandability can be measured in terms of the size of the source in which an expansion is recorded, with items from small reference works displaying a higher degree of expandability than those extracted from large corpora. Words for which no expansion can be found within the limited resources of the present study may well be expandable in larger sources, but as such expansions can be assumed to display a relatively low degree of expandability, they can be disregarded, and the words in question are counted as unexpandable. 2.3.2 Partial expandability Section 2.2.3.1 introduced an intermediate category between full motivatability and the lack thereof, called partial motivatability. The situation in the case of expandability is similar and will therefore be treated accordingly. Instances of partial expandability are distinguished from full expandability and unexpandability, but members of the partial category count as legitimate consociating factors towards the main final results. In contrast to the distinction made in the analysis between motivatability and transparency, the synthetic aspect is covered by the single term expandability, as only semantically related items are considered. In analogy to the position adopted for motivatability, expandability concerns both the spoken and the written modality. The examples given in Section 2.2.3.1 may be re-used here because motivatability is complementary to expandability: if a word is expandable, then the resulting word is motivatable with respect to its constituent and vice versa. In order for a word to be fully expandable from a phonetic point of view, it has to be completely integrated into the longer form. This concerns the individual sounds and - in the case of words comprising at least two syllables - the word accent as well. If a word fails to meet these conditions, it is only partially phonetically expandable. Thus, measure is fully expandable into measurement, while similar and cigar experience slight changes when expanded into similarity and cigarette and are therefore only partially expandable from a phonetic point of view. Quite analogously, full orthographic expandability is only reached if the chain of letters constituting a word can be completely integrated into a more complex word. Consequently, publish is fully expandable into publisher, while remove - removal and worry - worried are only instances of partial orthographic expandability. However, as the examples treated so far only involve either phonetic or orthographic difficulties, the same position is adopted as in the case of motivatability, namely Terminology 64 that full expandability in one of the two varieties is enough to consider a word fully expandable. Again, an intermediate category, partial expandability, is provided if both phonetic and orthographic changes are involved, but of course it may be assumed to play a much less crucial role in the synthetic than in the analytical direction. After all, the rich stock of the vocabulary offers many possibilities of finding a full expansion, e.g. overproud for proud, while the motivational analysis is very much restricted by the source word. In contrast to the motivational analysis, where the analyst has to try to make sense of the given material, the synthetic part of the study allows the researcher to look in all directions for expansions. Consequently, semantic obstacles are not to be expected very frequently either because they can be avoided by turning to other words. 2.3.3 Expandability and word formation Up to this point, the word word has been used without a proper definition - which is not too problematic as the motivational analysis merely subdivides the lexical items under consideration into any smaller morphosemantically related segments. The verification of expandability, however, requires a more precise definition of what a word is, simply because the search for expansions is supposed to yield words as well. In the literature, the term word has been defined so many times that the discussion fills whole books. 113 For the purpose of this study, a made-to-measure working definition with no claim to general applicability must suffice. Basically, the word word as used here coincides with Matthews’ lexeme, “the fundamental unit […] of the lexicon of the language” (Matthews 1974: 22). 114 Lipka (1992: 70) agrees with Bauer (1983: 13) that word-formation should actually be called lexeme-formation because it results in new lexemes. However, he decides to stick to the institutionalised term word-formation - a model that will be followed here as well. When deciding whether a word can be accepted as the expansion of another lexical item, the word formation process underlying the potential expansion may actually play a role. For instance, the category of compounds is relatively hard to delimit because there is considerable disagreement over what constitutes a member of this category. However, not admitting compounds in the search for expansions would massively distort the results for the lexical relations. 115 113 Cf. Wolff (1969: 21-25) for a discussion of different definitions. 114 Idioms and phraseologisms are not admitted as words in this working definition. 115 Rufener (1971: 9) assumes that compounding is the most common word-formation method in both English and German. Expandability 65 A first problem concerns the spelling of English compounds, which is not homogeneous: 116 some are written as one word and others are hyphenated. The problematic third subgroup, the so-called open compounds, consists of two separate words (Stein 1985: 38). According to Bauer (1998: 65), there are two camps of scholars with regard to their treatment of combinations of two nouns in English: The splitters see two classes of noun + noun sequence in English: syntactic constructions consisting of nouns with nominal modifiers, and compounds. The lumpers see a single class, usually identified as a class of compounds. Bauer (1998: 66), who discusses possible distinctive criteria, such as word stress and co-ordinability, concludes that “it is, in effect, a lack of evidence for the contrary position which leads to the conclusion that there is a single class, not positive evidence in its favour”. More difficult even than the class of noun + noun compounds are combinations of adjectives and nouns, such as nuclear disarmament and white wedding, which could be either compounds or noun phrases with adjectival premodifiers. As this question is rather difficult to answer, the approach followed here relies on their treatment in the dictionaries used. Such an approach, which considers all lemmas, also admits compounds consisting of more than two free lexical morphemes, e.g. heavy goods vehicle or do-ityourself. The so-called non-lexicalised compounds (cf. Bußmann 2002 s.v. Zusammenrückung) or phrase compounds (Bauer 1983: 206-207) are treated controversially in the literature. Particularly German linguists are reluctant to confer compound status to such items because they are not formed by the usual word-formation processes, and because the distinction between determiner and head can often not be upheld. Nonetheless, the present study attempts to document lexical consociation where it is present. As phrase compounds usually fulfil the criteria of wordhood, 117 they are counted as legitimate expansions. Derivatives, the second most important type of word formation, are a fairly straightforward category because they are spelled without interven- 116 According to Quirk et al. (1985: 1537), compounds are usually spelled as two words when they are not established. Hyphenation and spelling as one unit are seen as indicative of a word’s growing establishment. However, alternative spellings may also coexist. Consider Stein (1985: 41): “There seems to be an understanding among English lexicographers that they will never list all three spellings - solid, hyphenated, separate words - of a compound even if actually occurring in the language. Usually only one spelling is given. A close study of the dictionaries under consideration reveals that publishing houses often have a preference for one specific style used when the spelling is not fixed.” 117 According to Bußmann (2002 s.v. Wort), words can be isolated by pauses in speech and by blanks in writing, they take inflections, and they can be moved around or replaced in sentences. Even if Plag (2003: 136) denies compound status to such constructions as jack-in-the-box, he still considers them as words. Terminology 66 ing gaps, so that there can be no disagreement as to their word status on formal grounds. All derivatives that meet the criteria of formal and semantic similarity are theoretically accepted as full expansions in this study. Nevertheless, there are some suffixations that are on the borderline between derivation and inflection. To begin with, the German suffix -in is attached to nouns denoting people or animals in order to mark a female variant such as Autofahrerin ‘female driver’, derived from Autofahrer ‘driver’. 118 As it is questionable whether this should be regarded as lexical rather than grammatical information, such expansions are avoided where possible but accepted if no better expansion can be found. In English, zeroderivations and certain adverbs pose a problem which is quite similar. The name zero-derivation implies that a suffixation with a formally unrealised affix takes place, thus resulting in a new lexeme. For instance, the noun shower can be converted into the verb to shower. However, one may ask whether this is the only way of dealing with formally identical words that belong to different parts of speech, and indeed, Leisi (1999: 86-87) offers an alternative solution, namely grammatical polysemy: 119 Bei vielen englischen Wörtern ist die Wortart also neutral oder latent. Sie ergibt sich erst in einem gegebenen Text: Die Kategorie der Wortart ist im Englischen also weithin nur noch eine funktionelle und keine formale mehr. […] Eine große Anzahl von Wörtern wie water, air […] besitzt drei Funktionsmöglichkeiten: Substantiv, “Quasi-Adjektiv” (vor anderen Substantiven: water works [...]) und Verb. Sehr wahrscheinlich besitzen nur die wenigsten gebräuchlichen Wörter weniger als zwei Funktionsmöglichkeiten. 120 Consequently, it is possible to treat such pairs of words as if they constituted instances of the same lexeme. Even if the theory underlying the research project does not agree with this statement in its strongest form, grammatical polysemes are avoided as expansions because no formal 118 The usage note s.v. -in in Langenscheidts Großwörterbuch Deutsch als Fremdsprache, abbreviated to LGWDaF in the following, speaks of a masculine base to which the suffix attaches, but this is not entirely true because in many cases the German masculine form is identical with the gender-unmarked form, e.g. in the case of Lehrer ‘teacher’. 119 Actually, Leisi (1961: 259) calls this phenomenon grammatical homonymy. However, as the meaning of word pairs such as a shower/ to shower is closely related, it makes more sense to speak of grammatical polysemy, and to reserve the term grammatical homonymy for formally identical words belonging to different parts of speech that are semantically dissimilar, e.g. a swallow and to swallow. 120 Leisi (1999: 86-87): “In many English words, the part of speech is therefore neutral or latent. It only emerges in a given text, so that the category of part of speech in English is now to a large extent only functional, but not formal any more […]. A large number of words, such as water, air […] have three functional alternatives: noun, ‘quasiadjective’ (before other nouns: water works […]) and verb. It is highly probable that only a minority of common words have less than two functional alternatives” (my translation). Expandability 67 change is involved. However, they are admitted where no better expansion can be found. The treatment of English adverbs constitutes a very similar case. For instance, Quirk et al. (1985: 1556) state that the suffix -ly “can be very generally added to an adjective in a grammatical environment requiring an adverb (gradable if the adjective concerned is gradable), so that it could almost be regarded as inflectional”. The possible conclusion is that suffixes having only a grammatical role are “like grammatical inflexions and have no place in word-formation. This indeed is how some linguists treat the adverb suffix -ly, and the line is not always clear” (Quirk et al. 1985: 1547). 121 As in the other cases described above, such expansions are avoided where possible but are counted as consociating if no other expansion can be found. As may be clear from the definition of expandability in Section 2.3, the target word is required to be longer than the source word. 122 Therefore, it may come as a surprise that shortenings should be considered at all. However, this is done for the sake of completeness in the theoretical description. Furthermore, the creation of acronyms - one type of shortening - proceeds in two steps. First, several words are combined to make up a longer unit, and only then, this is shortened to the first letter of each word. 123 Thus, the word union could be expanded to European Union, which is subsequently shortened to EU. In the case of sponge’s potential expansion into bovine spongiform encephalopathy, the situation is on the one hand identical because the same word formation processes take place, but on the other hand quite different because the latter word is generally known only under its acronym BSE. Similarly, it is likely that comparatively few normal language users are aware of the fact that the word radar is actually an abbreviation for light amplification by stimulated emission of radiation, so that it is not a very suitable expansion candidate for emission. 124 As these group-internal differences make acronyms a very heterogeneous class, and as the shortening of the units leads to a result that is far shorter than the original consti- 121 In his computer-aided count of English vocabulary, Wolff (1969: 37) reduces all -lyadverbs to the corresponding adjectives and gives them an additional code. 122 Kastovsky (1982: 153) would not consider shortenings as belonging to the domain of word formation at all: according to him, only analysable lexemes, i.e. those which can be formally and semantically analysed into smaller motivating constituents, belong to the domain of word formation. Kanngießer (1985: 140), by contrast, speaks of “Wortbildungen durch Abkürzung und Verkürzung“, i.e. of word formation by means of abbreviation and shortening. 123 See Bauer (1983: 237-238) for a discussion of exceptions. 124 Both etymologies come from LDOCE. Terminology 68 tuents, acronyms are not considered legitimate expansions in the present study. 125 Another type of non-syntagmatic word formation is blending (Lipka 2002: 145-147). Nevertheless, even if this means that there is no combination of full signs, blends still evoke their constituents to a certain degree. Otherwise it would not be possible to guess the meaning of such an infrequent example as slanguage (Bauer 1983: 236). However, as the original word’s form is typically not fully included in the expanded word, e.g. in the blend smog as compared to its constituent smoke, blends usually do not attain the highest level of expandability, but only an intermediate one. In none of the cases analysed could a word only be expanded into a blend. Nevertheless, blends could theoretically be counted as either full or partial expansions, depending on their degree of similarity to the base. Back-formations are yet another type of shortening. From a diachronic point of view, back-formations are words derived by subtraction of wordformation elements from a longer form. The theory adopted here follows Scheler (1977: 135), who says that synchronically, stagemanage would have to be the base and stagemanager the derivative, and not the other way round. 126 Consequently, backformations among the analysis items will pass unnoticed and are treated in the opposite direction than would be correct from a historical point of view. 127 The fact that clippings, e.g. lab from laboratory, are shorter than their source words prevents them from becoming expansions. However, clippings among the analysis items could be expanded into their full form. Even though there is usually a difference in register (cf. Quirk et al. 1985: 1580), some may argue that clippings and their full forms are actually instances of the same lexeme. Therefore, clippings are only admitted as expansions if no better expansion can be found. 2.3.4 Expandability in the present study After dealing with expandability in a way that is approximately identical to the treatment of motivatability, the most important points on expandability can be summed up as well: 125 According to Wunderli (1989: 94-95), acronymisation does not produce new lexemes, but this view is not shared here. However, it is theoretically possible to expand some acronyms into longer words, e.g. Aids into Aids victim or BBC into BBC English. 126 Similarly, Kleinstadt would be classified as expandable into kleinstädtisch, even if the latter can be dated back to the 17 th century, while the former only appeared in the 19 th century. From a synchronic point of view, these two words behave just like Stadt, whose expansion into städtisch is historically correct (Erben 1964: 84). 127 This is also in line with Fill (1980: 17), who believes that the fact that a verb to lase is formed from the acronym laser shows the motivation of the longer word independently of how it was formed. Summary 69 1. It is synchronic. 2. It is language-immanent. 3. It is morphosemantic. 4. It is based on the (assumed) normal language user’s judgements. 5. It is based on sensible paraphrases of the expansion that involve the meaning of the formally similar or identical target word. 6. It considers only established words. 7. It may be full or partial. 8. It is restricted to particular word-formation processes. 2.4 Summary A word is dissociated if it is neither motivatable nor expandable. Consociation, by contrast, is the fact that a word can be either analysed into motivating constituents or expanded or both. These two processes result in the creation of a lexical network, a morphosemantically related word family. 3 Design of the research project Even though dissociation is a widespread concept in English linguistics, at least at German universities, no studies so far have been devoted to empirically proving or disproving the theory that the English vocabulary is more isolated than the German lexicon. In particular word-family integration - a word’s presence in other, usually longer, words - does not yet seem to have been investigated empirically and systematically. Furthermore, even though there are many works dealing with lexical motivation from a theoretical point of view, hardly any linguists have applied their theories to the study of larger samples of word material. One possible reason is given by Leisi (1981: 386-387), according to whom traditional linguistics considered itself more a part of humanities and therefore did not demand certain types of precision, while modern linguists understand their field of work as a science, which aims at quantitative descriptions. Another reason may reside in the fact that much important lexical research was carried out between the 1960s and the 1980s, i.e. at a time when computer tools were not as powerful and as easily accessible as they are now: truncated search, for example, finds members of a word family anywhere in a computerised dictionary, while the researchers of the past would have had to restrict themselves to the alphabetical environment of the word in question. 128 Nevertheless, as integration into word families applies to both the analytical and the synthetic direction, a number of studies on the motivation of the vocabularies in question can still be considered the predecessors of the present study. The most important comparative study of lexical motivation in English and German that comes closest to the central aims of the present project is Fill’s (1980) Wortdurchsichtigkeit im Englischen. According to Fill (1980: 77), he is the first researcher to attempt a quantitative comparison of motivation in English and German. 129 For German and French, a comparison of the respective degrees of motivation of both languages was carried out by Scheidegger (1981) in his Arbitraire et 128 Another way for them would have been setting up hypotheses about possible expansions and looking the potential words up individually in paper dictionaries. As these are extremely time-consuming and unreliable methods, this explains the large amount of research that still needs to be carried out in the domain of word families. 129 Only Rufener’s (1971) study, which is restricted to compounds, is mentioned as a predecessor. Bearing in mind that Fill argues that consociation consists of motivation only (cf. Section 1.3), it is possible to say that his study has tested the consociation of the English and German vocabulary according to one of its possible definitions. Design of the research project 72 motivation en français et en allemand. Of course there are related works, 130 some of them concentrating more on the English language, but German and French linguistic literature constitute the most important sources of inspiration for the present study. The following sections will discuss what aspects of these works 131 have influenced the current research project. The most important factors that were taken over from previous studies are the following: 1. It is necessary to analyse a large number of cases (cf. Fill 1980). 2. Problematic cases are discussed extensively (cf. Scheidegger 1981). 3. Frequency is a criterion for the choice of the material (cf. Scheidegger 1981). 4. The word lists are made accessible (cf. Scheidegger 1981). 132 5. If a word’s meaning is related to the meanings of its parts, it is motivatable even if the constituents explain the meaning insufficiently (cf. Scheidegger 1981). 6. There is motivatability even if it applies only to one meaning of a word (cf. Scheidegger 1981) - provided that a central lexical unit is motivatable. 7. Small formal divergence does not prevent motivatability (cf. Scheidegger 1981). Other aspects discussed in the research literature were rejected, though: 1. The present study does not use translation equivalents but frequency equivalents (vs. Fill 1980). 2. Instances of partial motivatability are not counted as unmotivatable (vs. Fill 1980). 3. Function words are not excluded (vs. Fill 1980 and Scheidegger 1981). 4. Motivatability in the present study typically requires distinct constituent signs in a determining relationship, but this is no defining criterion (vs. Scheidegger 1981). 5. Clippings are not considered to be motivatable by their longer forms (vs. Scheidegger 1981). 3.1 Material The material for the research project was chosen with the aim of representing both the English and the German vocabulary in such a way that they can be compared with each other. Representativity, of course, is an aspect already open to debate when only one language is concerned, and even more so when a common ground for the comparison of several languages 130 Among others, Augst (1975), Augst (1998), Erk (1985), Kandler and Winter (1992), Ljung (1974), Lu (1998), Mitterand (1968) and Rufener (1971) can be mentioned, even if they only play a marginal role in this context. 131 Including Blumenthal’s (1997) critical comments on Scheidegger (1981). 132 Readers who are interested in the study’s material and database are invited to get in contact with the author either directly or via the publisher. Material 73 needs to be found. The study attempts to achieve this aim by applying lexical analyses to a large amount of word material chosen by following a frequency-based approach. 133 3.1.1 Preliminary considerations The present study is based on the assumption that it is legitimate to contrast a particular word frequency range in different languages. This means that provided that there are two corpora which are each representative of a particular language, it should be possible to analyse the vocabulary contained in the same frequency range 134 and thus to compare the lexicon of the two languages in question. 135 In contrast to Fill (1980), the existence of translation equivalents plays no role. The aim of the present study is to investigate whether the English and the German language behave differently with respect to their words’ integration into word families - and this can also be achieved by approaching the vocabulary from a quantitative and functional point of view. Assuming that words occurring frequently in a language are particularly important for the language, 136 and assuming in addition that this is a universal characteristic, a comparison of the same number of similarly frequent words from two languages should ensure that the items under analysis have a comparable status. However, if only unedited word forms are counted in a sentence such as John likes what his friends like, the computer retrieves a word form likes 133 Related approaches can already be found in Mitterand (1968) and Scheidegger (1981). Both authors work with a frequency-based basic French vocabulary book: Mitterand with the two volumes of Mauger and Gougenheim’s Français élémentaire (1963 and 1964) and Scheidegger with its successor, the Français fondamental (Gougenheim et al. 1967) as well as with Oehler’s (1966) Grundwortschatz Deutsch. 134 This is not only assumed for the most frequent items of English and German, which are investigated in the present study, but also for the less frequent sections of the corpora. While the highest frequency ranges are quite likely to contain a large number of translation equivalents, such as corresponding pronouns, articles and auxiliary verbs, it is highly improbable that this would be the case for the low-frequency words. After all, it is a well-known fact that the makeup of the lower-frequency ranges is particularly affected by the composition of the particular corpus used: Hausser (1998: 10) points out that 52.8% of the types in the British National Corpus occur only once. Even though certain items with a frequency of one such as audiophile or dustheap can be found in dictionaries as well (Hausser 1998: 11) and are consequently not ad-hoc word formations, the occurrence of such words in a particular sample is definitely a matter of chance. 135 This claim is only made for the languages treated in this study, which are linked by a common Indo-European ancestry. However, it is possible that a common core of rough translation equivalents in the higher frequency ranges may turn out to be a more universal feature. 136 Note, however, that first language acquisition in its early phases does without precisely the most frequent items, namely the grammatical words. Crystal (1997: 245) speaks of the “‘telegraphic’ character of early sentences”. Design of the research project 74 and a word form like, of which there is only one instance each. The aim of this study, though, is to look not at word forms but at lexemes. Be, am, are, is, was, were, being and been will therefore not be treated as if they were eight different words, but rather as different forms of one and the same lexeme. In English, most verbs have far fewer than eight word forms, e.g. play, plays, playing, played, but in German, it is quite common for verbs to have a multitude of different representatives - consider the forms of the regular verb ‘to play’ spielen, spiele, spielst, spielt, spielte, spieltest, spielten, spieltet, spielend, gespielt. A count of word forms would clearly distort the comparative frequency: as occurrences of a particular lexeme are distributed among the different word forms, this would give the impression that the German verb spielen is far less frequent than its English equivalent to play, even if both lexemes should have precisely the same overall frequency. Therefore, lemmatised frequency lists that subsume all inflectional variants of a lexeme under the same list item are used. 137 This procedure is supported by the results of a preliminary study with the 300 most frequent items from an unlemmatised word list based on the German IDS Corpora: 138 the unlemmatised items proved to be impractical for the lexical analysis because the large amount of inflected forms would have required many additional regulations. 139 Nevertheless, lemmatised lists have their drawbacks as well. Their most important disadvantage is the fact that they do not work in precisely the same way for different languages - not only because the languages are inherently different, but also because the lemmatising programmes used for the different corpora adhere to different principles. As this makes the word lists less easily comparable, the following sections will describe in more detail both the corpora used in this study and the principles underlying the elaboration of the word lists. 137 Thus, house, houses, house’s, houses’ and Haus, Hauses, Häuser, Häusern are counted as a single English or German item, respectively. 138 See Section 3.4.1.1. 139 One problem that frequently poses itself in such a treatment is the question whether an item such as dieser ought to be considered motivatable because it contains two morphemes, namely dies and -er, or whether it ought to be considered unmotivatable because the second is not a lexical morpheme contributing to the word’s meaning in the strictest sense but only a functional morpheme adding grammatical information. Quite similarly, inflection considerably complicates matters for expandability: it has to be decided whether sind (‘are’) is not expandable at all, or whether expansions of its infinitive sein (‘to be’), such as Dasein (‘existence’), should be admitted. Due to the fact that German has far more inflection than English, a restrictive treatment of inflected forms would increase the number of either unmotivatable or unexpandable words in German. Material 75 3.1.2 Corpora 3.1.2.1 British National Corpus For English, the 100-million-word British National Corpus (BNC) is an obvious choice, as it constitutes a fairly large, tagged, easily accessible and welldocumented collection of texts. One of its main advantages is the fact that it was planned as a balanced corpus, with best-seller lists, library circulation statistics and catalogues of books and periodicals influencing the quantity and titles of the chosen samples (Aston and Burnard 1998: 28-29). Table 3 presents the domains covered in the BNC. 140 Table 3: Relative distribution of words across domains in the BNC Domain % Imaginative 21.91 Arts 8.08 Belief and thought 3.40 Commerce and finance 7.93 Leisure 11.13 Natural and pure science 4.18 Applied science 8.21 Social science 14.80 World affairs 18.39 Unclassified 1.93 Ten percent of the material in the BNC, that is about 10 million words, is made up of transcribed spoken material. The remaining written material can be categorised according to the medium of the texts: 141 Table 4: Relative distribution of words across media in the BNC Medium % Book 58.58 Periodical 31.08 Miscellaneous published (brochures etc.) 4.38 Miscellaneous unpublished (letters etc.) 4.00 Written-to-be-spoken (play scripts etc.) 1.52 Unclassified 0.40 Even if a higher percentage of spoken material in the corpus were desirable - after all, it is a widely shared view that the average language user encounters far more spoken than written language in everyday life -, the 140 Adapted from Aston and Burnard (1998: 29). 141 Adapted from Aston and Burnard (1998: 30). Design of the research project 76 BNC can be considered to be fairly balanced 142 and a good source for the English word frequency lists. For the English material, Kilgarriff’s lemmatised, frequency-sorted word list lemma.num (<http: / / www.kilgarriff. co.uk/ BNC_lists/ lemma.num>, 31.10.2006) was used. 143 This list considers only the 6,318 words occurring more than 800 times in the BNC. It does not contain numbers, names and other capitalised items - with the exception of the personal pronoun I. The word list also contains the parts of speech of all items, which were taken over unedited. The creation of the list is described in Kilgarriff (1997). 3.1.2.2 DWDS Core Corpus Unfortunately for a contrastive study such as the present one, there is no German corpus that can be considered perfectly equivalent to the British National Corpus. Nevertheless, there are two publicly accessible computerised corpora that can be used in the construction of German word lists. The corpus collection of the Institut für Deutsche Sprache (IDS) in Mannheim contains about 1,600 million words of running text. However, with a majority of newspaper texts, the corpus is far from being balanced, and is much larger than the BNC. 144 Therefore, the IDS Corpora were only used for the preliminary studies and in the search for additional expansions. 145 The contrastive analysis of German is based on the DWDS Core Corpus of the Berlin-Brandenburgische Akademie der Wissenschaften (BBAW). Within the scope of the project of the Digitales Wörterbuch der deutschen Sprache des 20. Jahrhunderts (DWDS) - a computerised dictionary of 20 th century German -, a 1 billion word corpus was created. Its core consists of a 100-million-word corpus that is publicly accessible and claims to be balanced and representative of the German language. 146 The makeup of this DWDS Core Corpus is indicated as follows: 147 142 Cf., however, Nation’s (2004: 3-4) criticism that it is not appropriate to use unchanged BNC frequency word lists for syllabus design because the BNC “is predominantly a corpus of British, adult, formal, informative language, and most English learners in primary and secondary school systems are not British, are children, and need both formal and informal language for both social and informative purposes.” 143 In 2004, an identical word list was downloaded from the domain <ftp: / / ftp.itri.bton.ac.uk/ bnc/ lemma.num>. 144 Cf. <http: / / www.ids-mannheim.de/ cosmas2/ referenz/ korpora.html> (31.10.2006). 145 However, these limitations can be partly overcome by the creation of customized subcorpora, which offer a wealth of possible research processes. 146 It is even claimed that the DWDS Core Corpus is at least on a level with the BNC in terms of selection and exploitation: “Mit dem DWDS-Kerncorpus steht der Sprachforschung und allen Sprachinteressierten zum ersten Mal ein dem British National Corpus (BNC) in Auswahl und Erschließung zumindest ebenbürtiges deutschsprachiges Textcorpus zur Verfügung” (<http: / / www.dwds.de/ ueber>, 31.10.2006). However, one must ask oneself whether corpora can really be representative of a particular language. One may actually argue that every corpus is only representative of Material 77 Table 5: Distribution of text types in the DWDS Core Corpus Domain Translation % of tokens Schöne Literatur Fictional texts 26 Journalistische Prosa Newspaper texts 27 Fachprosa Scientific texts 22 Gebrauchstexte Non-fictional texts 20 (Transkribierte) Texte gesprochener Sprache (Transcribed) Spoken texts 5 However, as the spoken data is not directly included in the Core Corpus but represents a separate corpus of its own, 148 the figures above need to be slightly modified to yield 26.40% words from fictional texts, 27.87% from newspapers, 24.23% from scientific texts and 21.50% from non-fictional texts respectively. With due caution, it can be stated that the DWDS Core Corpus is comparable to the BNC in several respects: 1. They both share the size of 100 million words. 2. With a proportion of about 22% in the BNC 149 and roughly 26% in the DWDS Core Corpus, the ratio of words from fictional texts is relatively similar. 3. 27.87% tokens from newspaper texts in the DWDS Core Corpus are met by 31.08% in the BNC (Aston and Burnard 1998: 30). 4. With 24.23%, the proportion of words from scientific texts in the DWDS Core Corpus is also highly similar to the combined figures for natural and pure science, applied science and social science in the BNC, namely 27.19%. 150 Despite these similarities, it must not be overlooked that there are still rather important differences between the two corpora: most importantly, the lack of spoken data in the DWDS Core Corpus itself prevents a perfect comparison with the BNC. Moreover, while the vast majority of texts in the BNC were written between 1975 and 1993, and only a small minority may go back to 1960, the German corpus’ aim is to cover the whole of the 20 th itself because there are no entities consisting of 10% spoken language and 90% written language etc. (pointed out by Dieter Götz). 147 Cf. <http: / / www.dwds.de/ textbasis/ kerncorpus>, 20.11.2006. All figures for the DWDS Core Corpus refer to its version 0.95. 148 Cf. <http: / / www.dwds.de/ textbasis/ kerncorpus>, 20.11.2006. The corpus of spoken German only became available after work on the present project had been completed. 149 Calculated from the data in Aston and Burnard (1998: 29). 150 Calculated from the data in Aston and Burnard (1998: 29). Design of the research project 78 century. Table 6 gives an overview of the distribution of the word tokens across decades: 151 Table 6: Distribution of the DWDS word tokens across decades Decade Tokens % 1900s 9,551,032 9.49 1910s 10,952,663 10.89 1920s 11,233,864 11.17 1930s 11,101,650 11.04 1940s 9,509,420 9.45 1950s 11,037,474 10.97 1960s 8,713,786 8.66 1970s 8,929,076 8.88 1980s 9,647,722 9.59 1990s 9,924,306 9.87 Nevertheless, one may be confident that the larger chronological range of the DWDS Core Corpus will only lead to a minor decrease in comparability with respect to the present study’s research question. The DWDS Core Corpus can be used to create PoS-tagged frequency lists. Unfortunately, though, its PoS-categories 152 are too fine for the purpose of the present study. Thus, the verb ändern ‘to change’ appears three times in the list of the 3,000 most frequent PoS-tagged lemmas, namely as full infinitive VVINF, as past participle VVPP and as finite verb VVFIN. This proceeding inflates the German word list with items that would be treated as identical in the English list. Therefore, as the basic parts of speech in the sense of a distinction between nouns, verbs etc. such as that in the BNC frequency lists usually pose no problem in German due to infinitive endings in verbs, capitalisation of nouns etc., only a lemmatised frequency list without the fine-grained PoS-distinction described above was used in the analyses. 3.1.3 Criteria for the extraction of frequency lists from the corpora The present study pursues the aim of describing the most frequent vocabulary. This basic underlying principle results in a very obvious point of departure, namely the item occurring most often in a particular corpus, but it 151 Calculated on the basis of the figures at <http: / / www.dwds.de/ textbasis/ kerncorpus>, 20.11.2006. 152 The parts of speech in the DWDS Core Corpus are based on the STTS tags of the Stuttgart/ Tübinger Tagset (sources: Alexander Geyken and <http: / / www.ifi.unizh.ch/ CL/ tagger/ UIS-STTS-Diffs.html>, 25.05.2005). They are lemmatised with TAGH (cf. Geyken and Hanneforth 2006) and PoS-tagged with Bryan Jurish’s moot-tagger (cf. <http: / / www.dwds.de/ erschliessung/ pos_tagger>, 31.10.2006). Material 79 leaves open where to draw the line between the most frequent and the less frequent items: the sample should be large enough to offer relevant insights into the language, but it should be small enough to be as independent as possible from the particular corpus used for the elaboration of the frequency list. A length of 2,500 items is plausible because this number lies between the figures 2,000 and 3,000, which seem to be recurrent in the domain of frequency lists and pedagogical word lists: 1. The latest edition of the Oxford Advanced Learner’s Dictionary (OALD) has the list of the Oxford 3000. These words, which are printed in larger type and followed by a key symbol in the dictionary entries, are partly based on frequency and “have been carefully selected by a group of language experts and experienced teachers as the words which should receive priority in vocabulary study because of their importance and usefulness” (OALD: R99). 2. The curriculum for English as a subject at the highest level of secondary education, the Gymnasium, in Bavaria 153 used to expect the pupils to have acquired more than 3,100 English words by the end of their schooling (Amtsblatt 1992: 72). The latest curricula, however, do not mention precise numbers of words any more. 154 3. The Longman Dictionary of Contemporary English marks the top 3,000 most frequent words from the Longman Corpus Network (LDOCE: xi, xiv), while at the same time using a defining vocabulary of only 2,000 words (LDOCE: xi). The original German word list comprised 3,000 items. After the exclusion of the items described in Section 3.2.1, it was reduced to precisely 2,500 list words - a figure that was then used in the delimitation of the BNC sample. All this speaks for the consideration of the 2,500 most frequent items in order of decreasing rank, i.e. with the highest-frequency item occupying rank number one. However, as soon as several words on a corpus-based list occur with the same frequency, they theoretically share the same rank, but as a matter of convention, each rank contains only one item. Within such same-frequency bands, the items may then be arranged according to alphabetical order, as in the present case, backward alphabetical order, part of speech - provided that this is encoded - etc. While this is unimportant for most items in the frequency band, it becomes important in determining the cut-off points of the sample. There are two options if the intended cutoff point does not coincide with the limits of same-frequency bands: either 153 Germany has no nationwide generally accepted curriculum, but the schooling is part of the sovereignty of the individual federal states. As the research project was carried out in Bavaria, the curriculum of this federal state was chosen for reasons of familiarity and geographical relevance. 154 <http: / / www.isb.bayern.de/ isb/ index.asp? MNav=0&QNav=4&TNav=0&INav=0>, 14.11.2006. Design of the research project 80 cutting through the bands in order to arrive precisely at the previously fixed number of words, or including only complete same-frequency bands, which may result in slightly heterogeneous numbers for samples from different corpora. 155 A third option consists in the search of extracts from both corpora whose frequency-range cut-off points are exactly identical. However, this is a matter of chance particularly for lower frequency bands. Of the remaining alternatives, the first one is chosen for statistical reasons. Even if the method is not without its difficulties, the consequences of cutting through same-frequency bands in a quasi-random fashion must not be overestimated. After all, as mentioned above, the corpora to be compared can be neither expected to be a hundred percent representative of their respective languages nor a hundred percent identical, so that all of the values obtained must necessarily be approximate anyway. 3.2 Status codes In a first step, the word lists from the two corpora were converted into tables, with each item occupying a line of its own. Then, a status column was introduced, in which items are either unmarked, which means that they are full list items intended for analysis, or they are provided with one of the four codes x, a, n or d. This procedure makes it possible to keep the complete original frequency lists 156 while at the same time allowing selective sorting. Table 7: Status codes Code Meaning Status full analysis items included in the analysis x unwanted items excluded from the analysis a shortenings (except clippings) excluded from the analysis n proper nouns excluded from the analysis d items derived from numbers and proper nouns excluded from the analysis 155 The discrepancy caused by this procedure is inversely proportional to the frequency of the words: the lower the frequency, the larger the same-frequency bands and the larger the possibly necessary shifts (cf. Hausser 1998: 10). 156 Where necessary, the form of the DWDS list items was changed to the usual word form, e.g. in the case of *genennt, which was turned into genannt ‘called’, and *Sammelung was changed to Sammlung ‘collection’. The form from the original list is retained in a separate column. Status codes 81 3.2.1 Excluded items Due to the fact that different lemmatising programmes are used in the compilation of the original frequency lists, certain items deemed superfluous for the analyses appear on one of the lists, but not on both, and so an attempt towards standardisation is made by introducing the status label x for excluded. All items bearing this label are omitted from the final analysis list, e.g.: 1. punctuation marks and symbols, such as . , ? * or $ 2. numbers written as combinations of figures, such as 18 or 1990 3. numbers written as words, such as acht or tausend 4. individual letters, such as S or e, with the exception of a and I in the English list because the latter are words consisting of a single letter 157 5. spelling variants of items already present in the list; e.g. organization/ organisation is only admitted once 6. words that are not recorded in the dictionaries - either not at all, e.g. German *Gutt, or not in the part of speech indicated in the wordfrequency list, e.g. the verb car 158 7. items occurring more than once on the list for some reason, e.g. the subordinating conjunction dass in the DWDS list. The category x faces the problem that the decision whether an item is a word in its own right is not always unequivocal, e.g. in the case of herrschend ‘reigning’ or entwickelt ‘developed’, which could either be inflected verb forms or zero-derived adjectives. 159 In cases like these, the strategy adopted was to discard list items not included in the English and German dictionaries used in the present study. 3.2.2 Shortenings Acronyms, e.g. DM and USA, and abbreviations, such as Mrd. or St., are marked with the letter a and excluded from the analyses for reasons that are very similar. Even though acronyms are motivatable, this is only true for the underlying complex expression, but not for the surface combination of letters - for example, USA stands for United States of America, which is a fairly selfexplanatory construction, as the USA is a federation of states situated on the American continent. The acronym USA, however, allows no such mor- 157 The fact that B, C, R, U, Y, 2 and 4 can stand for the English words be, see, are, you, why, too/ to and for in text messages and word games (cf. Mander 2000) is not taken into account. 158 Such items can be taken to be only erroneously included in the high-frequency list. 159 Landau (1996: 91) even calls this “one of the most difficult grammatical problems for monolingual lexicographers”. Design of the research project 82 phosemantic conclusions unless the full wording of the acronym’s deep structure is activated as an intermediate step. Including acronyms as unmotivatable items by definition would therefore inflate the figures for unmotivatable words, while counting them as motivatable items would result in an unjustified increase in motivatability. Consequently, acronyms are excluded from the analysis on principle. It is also very obvious that abbreviations, particularly those written with a full stop, have a referential function. German St., for example, points out its full form Sankt, which has the meaning ‘Saint’ in names. 160 Counting this as an instance of motivatability can hardly be justified, though, as the combination of letters St. is shorter than the potentially motivating Sankt. Also - in contrast to the acronyms with their underlying complex constructions -, abbreviations typically refer to one word only, and not necessarily to a complex one, as in the case of Sankt, which is not motivatable itself. On the other hand, St. is not completely unmotivatable either if a very broad definition of motivatability is applied: just as the meaning of complex words can be partially derived by looking at their constituents, the meaning of abbreviations can be found out by looking at the word they relate to. However, this argument can in turn be contradicted by the assumption that an abbreviation and its full form actually constitute only different formal realisations of one and the same lexeme. Consequently, all abbreviated items are also excluded from the analysis list. As far as motivatability is concerned, the above line of argument applies to clippings as the third type of shortening as well. However, clippings are often perceived as words in their own right - particularly when they are used more frequently than the full form, e.g. bus, whose full form omnibus is labelled as British English and old-fashioned in LDOCE. Therefore, clippings are not marked in any way and are included in the analysis list. 3.2.3 Proper nouns Proper nouns are marked with the letter n. They include names such as John, towns such as Berlin, currencies such as Dollar, etc. Proper nouns are excluded from the analysis because they belong to the periphery of the vocabulary in spite of their high frequency and - even more importantly - because it is highly questionable whether proper nouns can ever be consociated: after all, Bußmann (2002 s.v. Eigenname) remarks that there is disagreement on whether proper nouns have meaning. Without meaning, there is no motivatability, 161 and motivatability is one of the prerequisites 160 Alternatively, St. can also have the meanings Stück ‘piece’ and Stunde ‘hour’ (Duden Universalwörterbuch s.v. St.). 161 However, there are some exceptions. For instance, geographical names that take a particular ending are often not only recognisable as being human settlements, but also as belonging to a particular region, e.g. place names in -ing such as Ismaning or Status codes 83 for consociation. Even though expandability as the second aspect of consociation seems to exist for more than one proper noun, 162 the quasi-nonexistence of motivatability in proper nouns makes a comparison with words from other parts of speech a priori impossible, which speaks in favour of their omission from the analysis. However, the status code n is particularly problematic due to the fact that it is not always easy to determine the limits of the category of proper nouns, 163 especially in context-free word lists. The items that represent the days of the week and the months, for instance, fulfil some of the typical criteria for proper nouns, but not others: 164 they are usually but not always used without a determiner, they can be said to stand for singular if changing referents, but they do not usually have a genitive. Just like proper nouns, the names of the German months are never and the days of the week are rarely ever used in the plural form in the DWDS Core Corpus (<http: / / www.dwds.de>, 13.10.2006). However, considering that weekdays and months form word fields and are translatable, it seems that they are overall closer to the category of common nouns than to that of proper nouns. Nevertheless, the fact that they are spelled with a capital letter in English - which is indicative of a proper noun - suggests the alternative categorisation. Furthermore, this orthographic criterion plays a role in the establishment of frequency lists, as weekdays and months occur neither on Kilgarriff’s BNC frequency lists 165 nor among the 3,000 most frequent words of the spoken and written language in LDOCE. It is possible that languages differ in this respect, so that the days of the week and the names of the months should count as proper nouns in English, but be closer to the category of common nouns in German. 166 Nevertheless, in a contrastive study, it makes sense to either include or exclude such words in both languages. Since the English word list omits the relevant items, it is easiest to apply the same approach to the German language. Altötting, many of which are located in the South of Germany (pointed out by Philipp Stockhammer). Furthermore, geographical names such as Mainhausen and Neckarsulm betray that the corresponding localities are situated on the rivers Main and Neckar respectively - but sometimes there is misleading transparency in proper nouns, too, as the city of Mainz is actually closer to the Rhine than to the river Main. 162 For example, the former German currency Mark can be expanded into Markstück. 163 Cf. Bußmann (2002 s.v. Gattungsname), who states that there is no clear-cut dividing line between common and proper nouns. 164 Criteria from Glück (2000 s.v. Eigenname). 165 Kilgarriff states that “Numbers, names, and items that would usually be capitalised are excluded” (<http: / / www.kilgarriff.co.uk/ bnc-readme.html>, 13.10.2006). By contrast, the Oxford 3000, which considers not only frequency but also usefulness (cf. Section 3.1.3), includes the words for the days of the week and the months. 166 This assumption is supported by the fact that the STTS tags in the PoS-tagged DWDS Core Corpus list categorise the words for the days of the week and the months as common nouns. Design of the research project 84 Directions also form a small word field on the borderline between common and proper nouns. However, while Süden etc. are capitalised in German, English south etc. are not necessarily. This may be the reason why they are included in Kilgarriff’s BNC frequency list. For reasons of systematicity, though, they are also excluded from the analysis. 3.2.4 Items derived from numbers, proper nouns and acronyms The last status label in Table 7 is the d for items that are derived 167 from proper nouns, such as Berliner, 168 and for items that are derived from numbers, such as Dreier from drei ‘three’. These items are special in that they occupy an intermediate place between numbers/ proper nouns on the one hand and common nouns on the other hand, but as there seems to be a certain inclination towards numbers and proper nouns, d-labelled items are not included in the final analysis list either. 3.3 Analysis of motivatability For each item on the list that is not labelled as x, a, n or d, several analyses were carried out in order to determine its degree of integration into the vocabulary of the respective language. The first of these consociationeliciting tests applies to the item’s motivatability. As far as the depth of analysis is concerned, 169 the usual procedure is to subdivide the words into the two constituents of the highest analysis level. 170 As can be seen below, the motivating constituents for each list word, e.g. develop and -ment for development, are noted down in detail. The code in the following column of the table indicates the degree of motivatability. 171 167 The word derived is used here in a very wide reading that includes all wordformational processes, but compounds formed on the basis of numbers, e.g. zweimal ‘twice’, constitute an exception. 168 This also includes national adjectives such as griechisch ‘Greek’. 169 According to Coates (1964: 1049), each additional stage of derivation or compounding reduces motivation. In his model, endless would be more motivated than endlessness. However, one may also want to argue exactly the other way round, namely that complex word formations are more motivated than simpler ones because they contain more motivating constituents. 170 However, there are some words that require a deeper level of analysis for different reasons: for instance, gleichzeitig ‘simultaneously’ has to be analysed into gleich ‘same’, Zeit ‘time’ and the adjective-forming suffix -ig because an analysis on the highest level would result in the constituents gleich and zeitig ‘early’, the latter of which is only extremely remotely semantically related to the complex word. This heterogeneous treatment in the analysis column has no consequences on any other level. 171 Cf. Section 3.3.2 for an explanation of the codes. Analysis of motivatability 85 Table 8: Motivational analyses: notation conventions A NALYSIS ITEM C ONSTITUENTS C ODE development develop, -ment MO Unabhängigkeit unabhängig, -keit MO The constituent column is an innovation with respect to Fill (1980), who illustrates his method with a few selected examples only, 172 and to Scheidegger (1981), who includes all his lexical material in the results section but does not indicate the lexical relations motivating the words, which makes it difficult to follow him in cases that are not immediately obvious. 173 The inclusion of detailed analyses in the present research project is therefore a conscious innovation intended to render the researcher’s analyses more transparent and allowing the readers to verify the analyses and to arrive at their own conclusions. 174 3.3.1 Sources The transparency of the procedure is of particular importance, as the existence of constituents is usually postulated on the basis of the analyst’s knowledge of the languages in question only. 175 It may surprise the reader that the Wortfamilienwörterbuch der deutschen Gegenwartssprache (Augst 1998) is not drawn upon, as this dictionary explicitly attempts to portray contemporary German word families from the point of view of the synchronic etymological competence. However, there are several aspects that make the consultation of this dictionary less useful than one might initially expect, 176 so that no notable increase in objectivity could have been expected by using it. By contrast, relying both on the analyst’s intuition - which would have been necessary at least where items are not recorded in Augst (1998) - and the word family dictionary would rather have resulted in decreased consistency due to the fact that synchronic etymological competence is userdependent (cf. Ickler 1999: 298) and that the editorial staff and the analyst 172 This can be explained by the fact that he uses an extremely large database. 173 For instance, it may not be immediately obvious why débrouiller, placard, auffallen and Ausweis are considered to be motivated. 174 As the lists are too long to be included in print, readers who are interested in the study’s material and database are invited to get in contact with the author either directly or via the publisher. 175 However, this has been assessed as relatively high by independent judges, and in case of doubt, the usual reference works (see Section 3.4.1.1) are consulted for information on possible meanings of constituents etc. 176 The structure of the book requires the reduction of complex words to their core word because that is where they are entered, even in semantically doubtful cases - e.g. unschlüssig ‘indecisive’ under schließen ‘to lock’ (cf. Barz 2000: 49-50). Furthermore, gradations in the degree of word-family members’ motivatability, which would be required for the present study, are not indicated anywhere (Ickler 1999: 305). Design of the research project 86 of the present study may disagree on particular aspects. The logical conclusion is that it is preferable to have a consistent approach in the motivational analyses even though this is based on the intuition of one person only. 3.3.2 Motivatability codes The nature of the motivational codes in the present study follows the prototypical threefold distinction between motivated, partially motivated and unmotivated words that is frequently found in the literature. Yet this general distinction has been refined by the addition of a certain number of subcategories in order to increase precision. 3.3.2.1 Unmotivatable items Within the category of unmotivatable words, a further distinction can be made between items that are completely opaque morphosemantically - which are labelled UN for unmotivatable - and others that are reminiscent of existing morphemes in their form in spite of being equally opaque morphosemantically: they are put in the category UT, whose acronym stands for unmotivatable but transparent. UN items include monomorphemic words such as Blatt ‘leaf’ and Ding ‘thing’, but also historically complex words that are now obscured, e.g. the former compound lord. 177 UT items, by contrast, are always superficially complex and completely analysable by definition. 178 This category comprises pseudo-complex items, such as understand and sofort ‘immediately’, which cannot synchronically be related to under and stand or so ‘so’ and fort ‘away’ in their meaning, but only in their form. 179 Interestingly, though, these pseudo-constituents may still be felt to have a possible diachronic relation to the item in question. 180 By contrast, this is not the case for a subclass of the UT items comprising words such as einfach ‘easy’, which has even less to do with ein ‘one’ and Fach ‘tray’. Nevertheless, such possible pseudo-morphological analyses are included whenever it is possible to analyse an item completely into pseudoconstituents, and these instances are marked with the usual code UT fol- 177 See Götz (1971) for a detailed discussion of obscured compounds in English. 178 It is not enough if an item contains only one pseudo-morpheme while a remainder remains. Still, minimal formal differences between the pseudo-constituents and the corresponding word segments are tolerated, e.g. in university, which is analysed into universe and -ity. 179 Theoretically, transparency is based on either orthographic or phonetic similarity or both, but in practice orthographic pseudo-constituents are easier to spot, so that the UT items, which belong to the periphery of the research interests, are only orthography-based. 180 For both items mentioned, the etymologies in the SOED and the UW confirm this hypothesis. Analysis of motivatability 87 lowed by a question mark. 181 The overall reason for the introduction of the codes UT and UT? is the goal of describing the degrees of morphological motivatability in English and German as accurately as possible. Even though merely-transparent items do not enter the motivational count, they belong to the periphery of the phenomenon of consociation and ought to be described accordingly. 182 Table 9: Codes for unmotivatable items Code Example Constituents Explanation U unmotivatable (all together) UN leaf Blatt - completely unmotivatable UT understand sofort [under, [stand [so, [fort unmotivatable + transparent UT? corner einfach [corn, [-er [ein, [Fach unmotivatable + transparent + far-fetched 3.3.2.2 Partially motivatable items The largest part of the motivational codes is made up by the wide variety of partially motivatable items. This can be explained by the fact that this category is graded by its very nature. A whole range of factors may lead to a reduced degree of motivatability, e.g. 1. differences in pronunciation between the complex item and its constituents 2. differences in spelling between the complex item and its constituents 3. semantic differences between the complex item and its constituents 4. the fact that an unidentifiable remainder remains after the analysis of the item 5. the fact that the constituents of an item are more restricted in their usage than the complex word itself 6. the fact that the item is merely motivatable by a grammatical polyseme. Of course, these factors can also be combined. This is reflected in the respective codes, in which the label MP, which is common to all the items in the category of partial motivatability, is supplemented by the individual application of additional codes, so that the complex code MPA, for in- 181 However, the borderline between UT and UT? is a very fuzzy one, as both categories are ultimately distinguished by intuitive diachronic-etymological judgements that are not verified. Therefore, the results for both subcodes will be generally subsumed under the label UT unless stated otherwise. 182 Pseudo-constituents are preceded by an open square bracket in order to underline their special status. Design of the research project 88 stance, relates to partial motivatability by means of an affix only. 183 The constituents concerned are preceded by an open round bracket in the list, but this has no consequences for sorting and is only done in order to increase reader-friendliness. If at least one of the constituents of an item differs from the complex word in both spelling and pronunciation, this is marked with an F for form, e.g. heutig ‘present-day’, because its obvious constituent heute ‘today’ is not completely included in the complex form, the final <e> being omitted. Apart from such shortening of constituents, formal differences may also include additional stress shifts, e.g. in contribute vs. contribution. While Ellis and Beaton (1993: 585) computed different measures for formal differences such as letter overlap with or without position, the present study confines itself to stating the relations between the analysis word and the consociated item. This approach is close to Hausmann (2005: 19), who believes that intelligent learners effortlessly recognise similarities between words, even if the precise description of the differences could fill several pages. 184 Derwing, Smith and Wiebe (1995: 19) also state that “the high degree of phonetic and semantic similarity between pairs such as message/ messenger, proceed/ procedure, fire/ fiery, abstain/ abstinence, and space/ spatial is perfectly adequate to indicate the morphological relationships involved, despite small deviations in the spelling.” If formal obstacles are limited to the spoken variety, 185 this is marked by the code sequence FS - with S standing for spoken, e.g. in the incompletely analysable dramatic, whose constituent drama experiences changes in the domains of stress and vowel quality. The code FW marks formal obstacles that are limited to the written variety only - with W standing for written, e.g. in always, whose constituent all is spelled with an additional <l>. 186 A subcategory of items with formal obstacles focuses on purely vowelbased differences between the complex words and their constituents. It is labelled FV, with V standing for vowel. Examples are vowel gradation or ablaut pairs such as English sing/ song or German leuchten ‘shine’/ Licht 183 See Section 3.3.2.2 for a summary of the codes of partial motivatability. 184 He emphasises that complexity in linguistic description must not be equated with complexity for the speaker’s intuition (Hausmann 2005: 19). 185 As will become clear from the definitions of the completely motivatable words, purely spoken or written differences have to be combined with other obstacles to constitute partial motivatability; otherwise the affected words are counted as instances of complete motivatability (see Section 3.3.2.3). 186 Support for distinguishing these two levels comes from findings in psycholinguistic studies and questioning of informants. According to Fill (1976: 14), subjects who associated acknowledge with knowledge or know, and whose attention was drawn to the different pronunciation, said that when they thought about it they thought about the spelling. However, it must be noted that the in this particular example is actually provided by the prefix ac- (pointed out by Hans Rainer Fickenscher). Analysis of motivatability 89 ‘light’. A related phenomenon, vowel mutation or umlaut, is also historically present in both languages, but synchronically, it is only clearly recognisable for the normal language user in German, where a diacritical mark consisting of two horizontal dots is added above the letters <a>, <o> and <u>, which become <ä>, <ö> and <ü> respectively. This is accompanied by a difference in pronunciation. As these differences between original and mutated vowel are rather small and far more systematic than ablaut changes, words containing an umlaut may be assumed to be more easily analysable than those including another vowel change. It therefore makes sense to mark them individually in order to calculate their effect in a separate count. The code FVU represents this special status by its combination of the hyperonymous FV code with the label U for umlaut. 187 Even though the analysis attempts to capture fine motivatabilitydecreasing details in order to arrive at maximally precise results, a few recurrent formal phenomena are excepted from this general treatment: 1. German verbal lemmata are treated in the analysis as if they consisted solely of their stem, i.e. the infinitive suffix -en is dropped, as its occurrence is a purely grammatical convention generally applying to German verbs. 188 2. No distinction is made between the German allographemes <ß> and <ss>. 189 Consequently, the difference between Abschluss and abschließen is marked as merely vowel-based. 3. Differences in spelling that only involve the use or non-use of capital letters are disregarded, so that staatlich ‘stately’ can be fully motivated by Staat ‘state’ and the adjectival suffix -lich. This is due to the fact that all German nouns are spelled with an initial capital letter. The fact that spelling is to a certain extent system-related justifies the omission of this particular feature. 190 4. In British English, the phonetic realisation of a word such as consideration contains a syllable-initial -sound that is only orthographically but not phonetically present syllable-finally in the constituent 187 Cf. Bußmann (2002) s.v. Umlaut and Ablaut and SOED s.v. umlaut and ablaut for the whole paragraph. 188 For the sake of readability, though, they are noted down in their full form in the analysis lists. 189 Since the German orthographic reform in 1998, the letter <ß> occurs after long vowels or diphthongs and the phonetically identical <ss> after short vowels. At the end of words with preceding short vowels, <ß> has been replaced with <ss>, so that older and more recent texts may differ. The situation is further complicated by the fact that <ss> is used instead of <ß> if the special character is unavailable and in words that are written exclusively in capital letters. In Swiss German, <ss> may be always used instead of <ß> (Duden 2006: 94-95). 190 Furthermore, Eckstein (2004: 65-66) mentions studies that reveal no effect of capitalising in lexical decision tests etc. Design of the research project 90 consider. As a parallel phenomenon can be observed in German - for instance where the in heran’s constituent her is realised as -, position-dependent variation in the phonetic realisation of the grapheme <r> is disregarded in both languages. 191 5. Another exception applying to both English and German is the generous treatment of the letter <e>: on a strict formal level, an <e> has to be deleted in the analysis of producer into produce and -er, and in that of surprised into surprise and -ed. Also, if gebildet is claimed to consist of ge-, bild(en) and -t, an <e> has to be supplied from somewhere. Of course, it would be possible to postulate suffix variants with/ without <e> respectively, but the approach followed here is to simply ignore the instable <e> at morpheme boundaries if it is followed by a suffix beginning with <e>, or if an <e> is generated from the dropped infinitive ending, as it were. Semantic differences between a complex word and its constituents can decrease motivatability as well. They are marked with a hash symbol, e.g. in beinahe ‘almost’, where it is impossible to derive the meaning of the full word from the elements bei ‘at’ and nahe ‘near’, even though there is an undeniable semantic link between ‘almost’ and ‘near + at’. Similarly, the label # is used if an affix is postulated in a word with a part of speech this kind of affix does not usually attach to. 192 Very often, the MP# items are lexicalised metaphors and metonymies. Thus, airport is composed of a first constituent remotely representing the flying aspect and a second one which can be seen as a metaphorical extension of the usual nautical sense (cf. LDOCE s.v. port). The fact that airport consists of two such indirectly related constituents, while Bahnhof ‘station’ 193 requires only one metaphorical extension, is not reflected in the data, though, as the existence of one constituent with semantic irregularities is considered enough to classify a word as MP# . The same rule applies to the other categories as well. Every so often, unmotivatable or even unidentifiable elements remain after the assignment of motivating constituents to the list words. 194 Depending on the nature of the motivating elements, this phenomenon is codified in two different ways: if an affix is the only recognisable motivating constituent within an item, this is labelled as MPA, with A standing for 191 A further argument for this approach in English is the fact that rhotic varieties of English do not make such a distinction (Herbst, Stoll and Westermayr 1991: 209). 192 This is the case in the noun representative, where the meaning of -ative ‘doing or tending to do sth’ (LDOCE s.v. -ative) is still recognisable. 193 The German word for ‘station’ is composed of Bahn ‘railway’ and Hof - literally ‘yard; court; farm’. 194 Rettig (1981: 40) also believes that words are not unanalysable from the outset if a remainder that cannot be related to other lexical elements remains. Analysis of motivatability 91 affix. 195 Examples are Mädchen ‘girl’, where the diminutive affix is a semantic feature of the derivative, 196 or royal, where the adjectival suffix -al provides information on the part of speech of the word. 197 The MPA label is used regardless of the semantic load of the affix 198 provided there is a certain degree of semantic compatibility between a complex word and a potential affix. The fact that MPR items cannot be successfully analysed in their entirety either is reflected by the R in the code, which stands for the unmotivatable remainder. The code MPR can be partly defined by exclusion: it covers words containing motivating elements - with the exception of affixes - as well as an unanalysable remainder. Thus, human can be related to man from the point of view of synchronic etymological competence, but there is no prefix or word *hu in the English language. In many cases, the unmotivatable remainder of a list item formally resembles a semantically unrelated word or affix. When this is the case, the code of partial motivatability is supplemented with a T for transparent. If the motivating element is a word, the resulting code is MPRT, e.g. in the word meanwhile: while the word while can be confidently claimed to have a synchronic morphosemantic link with the complex item, mean does not exhibit such a relation. If a word can be motivated by an affix and exhibits a transparent remainder, this is labelled as MPART, e.g. in hardly, which can be motivated by the adverbial suffix -ly, but has nothing to do with the meaning of ‘hard’. In some cases, potential constituents are more restricted in their usage than the complex word itself. This phenomenon can operate on different levels: 199 195 Cf. von Polenz (1967: 77-78), who classifies words such as German renovieren ‘renovate’ as partially motivated through the prefix, even if the bound base cannot occur on its own. 196 While this is also true from an etymological point of view, the other original constituent, Magd ‘maid’ is no longer synchronically discernible, at least not with semantics as the starting point. There is a relation (pointed out by Hans Rainer Fickenscher) to Maid, a word for a young girl, but this is obsolete and only rarely used jocularly (cf. UW s.v. Maid). 197 The part of speech is considered part of word meaning in the approach followed here. Cf. Motsch (1995: 207), who reports regularities between particular German affixes, the word class of the bases they attach to and the word class of the resulting word-formation. 198 The situation is slightly more complicated in the case of prefixes, as these have no word-class-indicative function. 199 Cf. Hausmann and Wiegand (1989: 341). Design of the research project 92 1. diachronic: even if it is possible to motivate Gemeinschaft ‘community’ synchronically via -schaft ‘group of …’ and gemein ‘common’, this meaning of the adjective is labelled as obsolescent (cf. UW s.v. gemein) 200 2. diatopic: beobachten ‘to observe’ can be partially motivated by Obacht ‘attention’, but this constituent is restricted to Southern and Austrian German (cf. LGWDaF) 3. stylistic: in contrast to the complex item Richter ‘judge’, which is stylistically neutral, the verb richten is marked as belonging to the written variety in LGWDaF. 201 Words that are either (getting) out of use in the modern varieties of English and German or restricted to a particular region may not be generally known to contemporary native speakers or learners of the language. Consequently, they are marked with an L for label, which allows filtering them out where this is more adequate. 202 This label is also applied to constituents that are marked as belonging to the written, formal or informal variety of the language only. Some constituents are not labelled in the dictionaries but are still intuitively felt to be marked. This is not surprising, considering that diasystematic labelling is not always identical in different works of reference. 203 In cases like these, the label MPLI - with I alluding to the first person pronoun - is introduced in order to keep the stylistic distinctions of the MPL label while at the same time permitting a separate treatment of the nonattested stylistic labels. 204 The categories of partial motivatability introduced so far have in common that the meaning of the constituents is different from the meaning of the complex word. However, there are also instances where a list item is both formally and semantically related to a shorter word whose meaning is so similar that the potential constituent and the complex item are practically synonymous, e.g. amongst and among. This is particularly true of words 200 For most contemporary speakers, the first meaning of gemein that typically springs to mind is more likely to be ‘nasty’. 201 The distribution of the labels geschr/ gespr (= geschrieben ‘written’/ gesprochen ‘spoken’) in LGWDaF and geh./ ugs. (= gehoben ‘elevated’/ umgangssprachlich ‘informal’) in UW suggests that the labels are used interchangeably, even if the LGWDaF distinction is based on medium and the UW distinction on register. 202 Nevertheless, they are included in the description of the two languages because their conscious omission would be unsystematic. 203 For example, LGWDaF refers to sich in etwas schicken ‘to resign oneself to sth’ as belonging to the written variety, while the UW labels it as obsolescent. 204 For example, the infinitive verb gebären ‘to give birth to’, which can be considered a motivating factor in the adjective geboren ‘born’, is not marked as obsolete in the dictionaries, but the fact that it is almost exclusively used in the passive and the perfect tense (cf. LGWDaF s.v. gebären) makes it an LI-label candidate. Analysis of motivatability 93 that can be reduced to clippings, e.g. laboratory/ lab. Even though one may argue that pairs such as those mentioned above have an identical denotation and could therefore be considered different variants of one and the same lexeme, 205 the present study takes the view that the connotative and/ or stylistic meaning 206 of the members of such pairs justifies their status as words in their own right, which in turn legitimises their motivating relationship. Nonetheless, words that can be motivated either by clippings or by formally related shorter synonyms are marked with a C for clipping, and this type of motivatability is avoided where possible. A related problem is that of grammatical polysemes. Even though some argue that words related by means of zero-derivation actually belong to the same lexeme, the approach followed here sees them as individual words: an aid is not semantically identical with to aid, and neither is Plan ‘plan’ with planen ‘to make a plan’. 207 The fact that there are two semantically similar yet formally identical items ought to be considered in the analysis of motivatability, as the omission of this aspect may disregard an important motivating factor. This kind of lexical relation is only encoded if no other motivating relation can be found, and the relevant words are marked with a Z because of the zero-derivation obtaining between the related words. There is one exception, though, in which no MPZ motivatability is accepted: as has been mentioned before, part-of-speech assignment of grammatical words is often problematic for non-linguists and even for beginning students of linguistics. Consequently, functional words - including adverbs - that are formally identical are treated as instances of the same lexeme, which means that the preposition after cannot be motivated by the existence of the conjunction or adverb of the same form. However, motivatability via a formally identical lexical word is possible, e.g. in the case of the marginal preposition worth (cf. Quirk et al. 1985: 667), which can be motivated by the corresponding noun. Motivating relations between grammatical polysemes are only postulated if paraphrasing yields acceptable results: ‘if you paint something, you apply paint to a surface’ and ‘paint is a substance used in painting’ are both true statements. Therefore, the verb paint can be MPZ-motivated by the noun, and the opposite direction would also be accepted. On the other hand, the noun bottle cannot be motivated by the corresponding verb because the paraphrase ‘a bottle is what you use for bottling’ is too peripheral 205 Wunderli (1989: 94-95), for instance, argues that the process of clipping does not produce new lexemes. 206 Quite frequently, clippings are less formal than the full-form word (cf. Bauer 1983: 233 and Quirk et al. 1985: 1580). 207 The German infinitive ending is treated as if it were non-existent. In some instances such as Treffen ‘meeting’, though, the infinitive is retained in the grammatical polyseme, but this is not marked in any way. Design of the research project 94 in the meaning of bottle. 208 Similarly ‘to pay is to give someone their pay’ is not an acceptable statement because the verb pay is frequently used with goods and does not necessarily involve the money regularly paid for someone’s work. 209 Thus, only cases with an acceptable directionality are included. 210 In sum, the basic categories of partial motivatability are as follows: Table 10: Partial motivatability codes Code Example Constituents Explanation MP partially motivatable (all together) MPF distribution heutig (distribute, -ion (heute, -ig partially motivatable due to differences in spelling and pronunciation MPFV fill springen (full (Sprung partially motivatable due to vowel gradation MPFVU jährlich - (Jahr, -lich partially motivatable due to vowel mutation (only in German) MP# nobody verteilen no, (body ver-, (teilen partially motivatable due to semantic differences MPA royal Mädchen (-al (-chen partially motivatable due to motivation by affixes only MPR yesterday Prozent day pro partially motivatable due to an unmotivated remainder MPRT tonight voraus [to, night vor, [aus partially motivatable due to an unmotivated but transparent remainder MPART hardly Meister [hard, (-ly [meist, (-er partially motivatable due to an affix and a transparent remainder MPL computer Gemeinschaft (compute, -er (gemein, -schaft partially motivatable due to constituents that are diasystematically marked in the dictionaries used MPLI tired weiblich (tire, -ed (Weib, -lich partially motivatable due to constituents that are diasystematically marked by the analyst MPC amongst Gehirn among, [-st [ge-, (Hirn partially motivatable by a clipping or a formally related shorter synonym MPZ list (v) Leben list (n) leben partially motivatable by a grammatical polyseme 208 The verb bottle can be motivated by the noun, though, as to bottle is ‘to fill something into a bottle’. 209 By contrast, the opposite statement ‘the pay is the money that people are paid for their work’ is accepted. 210 This is firstly because motivating relationships with an unacceptable directionality are harder to think of, so that many may have been overlooked, and secondly because they are less likely to be of importance within the psycholinguistic and didactic perspectives discussed in Chapter 6. Also, according to Plag (2003: 111), “it seems that for the vast majority of cases it is possible to establish the direction of conversion.” Analysis of motivatability 95 These codes can be more or less freely combined with each other. 211 Whenever the criteria for a particular subclass are met for at least one constituent, the relevant code letters are added to the code of the entire word. 3.3.2.3 Completely motivatable items Judging from its nature at the top end of the scale of motivatability, the category of completely motivatable items may be expected to form a homogeneous mass. Yet in spite of the fact that all members of this category must conform to the rule of complete morphosemantic analysability, it is still possible to distinguish a few subcategories that are marked with the label MO for completely motivatable, which is then supplemented by an individual code. As laid out in Section 2.2.3.1, all other conditions being met, it is enough for a word to be completely motivatable in this study if it is formally completely analysable either in its written or in its spoken form. However, differences confined to either of the two varieties are marked with separate codes in order to allow their retracing in the final analysis. Slight differences in the spelling only, e.g. in daily as against its free constituent day, are labelled MOW for written, while differences in pronunciation only, e.g. in preference vs. prefer , 212 or the stress shift in environmental vs. environment, are marked with the code MOS for spoken. If a word is fully motivatable on the basis of one or more affixes that are not contained in the reference works admitted here, 213 the word is labelled MOl if the selfelaborated affix is lexical, and MOg if the self-elaborated affix is grammatical. Table 11: Full motivatability codes Code Example Constituents Explanation MO fully motivatable (all together) MOS connection Aktivität connect, -ion aktiv, -ität only difference in pronunciation between constituents and complex word MOW winner win, -er only difference in spelling between constituents and complex word (only in English) MOl justice Bedeutung just, -ice bedeuten, -ung only containing a self-elaborated lexical affix MOg joint später join, -t spät, -er only containing a self-elaborated grammatical affix 211 The codes FS for obstacles in the spoken form and FW for obstacles in the written form can be used in combination with other codes of partial motivation, but they do not occur on their own, as items with obstacles in one variety only are classified as fully motivated (cf. below). The same is true of the codes MOl and MOg for self-elaborated lexical and grammatical affixes (cf. below). 212 Transcriptions based on OALD. 213 See Section 3.3.3. Design of the research project 96 3.3.3 Principles A few principles underlying the analysis of motivatability deserve further explanation. For instance, it has been said before that the motivational analyses are based on the analyst’s knowledge of the languages in question, but in addition, digital and printed reference works are drawn upon where necessary. This may appear to contradict the principle of the linguistically 214 inexperienced language user, but it is required in the following cases: a) In order to determine which affixes and combining forms are accepted in the analyses. Thus, only affixes contained in smaller dictionaries are accepted, as bound morphemes occurring only in large reference works may be too special to be known by the average language user, who is the point of departure of the analysis. Consequently, only affixes occurring in the Oxford Advanced Learner’s Dictionary or the Longman Dictionary of Contemporary English are accepted for English, 215 and only affixes occurring in the Langenscheidt Großwörterbuch Deutsch als Fremdsprache or in the Duden Universalwörterbuch are accepted for German. 216 Nonetheless, this principle is not slavishly applied because some affixes that are intuitively recognisable in the words from the frequency lists are not recorded in any of the dictionaries above, e.g. German -ung, which occurs in Befreiung ‘liberation’ or Planung ‘planning’ - words that can be unproblematically parsed into the verbs befreien ‘to liberate’ and planen ‘to plan’ and a commonly found nominalising suffix. 217 The tables in Section 4.1.4.2 and Section 4.2.3.2 list all the synchronically detectable affixes from the English and German words, with those that 214 In the context of the normal language user, the ambiguous term linguistic refers not to language in general, but to linguistics as the study of language. 215 Therefore, accompany can only be motivated by company, and not by ac-, even though this suffix occurs in the SOED. 216 The fact that the second dictionary drawn upon for German is a medium-sized reference work is due to the fact that German dictionaries seem to include far fewer affixes than English dictionaries, so that it is necessary to go one step higher in order to include German affixes such as -ion whose English equivalents are listed in the learner’s dictionaries already. The tables in Section 4.2.3.2 give a detailed account of the affixes that have entered the list of permissible affixes via the second dictionary. 217 The argumentation is slightly more problematic where the analysis suggests the existence of an affix in spite of the fact that no suitable base can be found. This is the case of -il, which is postulated as an adjective-forming suffix present in German stabil, agil or debil, and which is accepted as legitimate in analogy to other MPA words. However, sequences of sounds and/ or letters are only interpretable as affixes if semantic and/ or functional similarity - the latter being the marking of a particular part of speech - are present. Consequently, no prefix *empis postulated in empfangen ‘receive’, empfehlen ‘recommend’ and empfinden ‘feel’ because no common semantic core emerges. Analysis of motivatability 97 are not contained in the dictionaries marked with a star. Where such selfelaborated affixes occur in a list word, the motivational code is supplemented with the label l for lexical affixes - e.g. the -t in complaint -, or g for grammatical affixes - e.g. the -st in most. 218 Thus, the existence of selfelaborated affixes within analysis items is marked, but it does not count as a factor resulting in partial motivatability. Therefore, Befreiung and Planung are classified as fully motivatable MOl. b) In order to check the meaning of affixes. As some affixes are highly polysemous, e.g. German auf-, for which LGWDaF records seven different meanings, an analyst may only think of some of these during the time allotted to the motivational analysis of each word. Therefore, an affix recognisable in a particular word might be classified as transparent but not motivating in spite of the fact that a particular meaning of the affix, which would be recognised by the analyst at a different point in time, does actually result in motivatability. Furthermore, even though a majority of affixes can intuitively be assigned some kind of meaning, 219 the meaning of other affixes is extremely vague. The German prefixes in particular pose problems in this respect, so that it is preferable to check the possible meanings if it is doubtful whether an MPA can be postulated. 220 c) In order to check the meaning of free constituents. Dictionaries are also sometimes used in order to determine whether a particular meaning of a free morpheme or word occurring within a complex lexeme should be considered as one of the lexical units making up this word’s meaning, or as a metaphorical or metonymic extension thereof. For example, the constituent Kreis in Arbeitskreis ‘working group’ has the basic meaning ‘circle’, but according to LGWDaF (s.v. Kreis) also that of ‘group of people doing something together’, so that the item is classified as completely motivatable and not as semantically special. 221 218 One such motivating bound element is the whin interrogative or relative pronouns and adverbs. Not only is it indirectly included in OALD in the lemmas wh-question and wh-word, but its existence is also justified by an entry of its own in the SOED - in contrast to the most famous English phonaestheme, sl- (cf. Firth 1964: 184). Even though the German equivalent W-Wort is used as a grammatical term, at least in primary education in Germany, it is not recorded in the dictionaries used here. 219 However, this will rarely ever encompass all the lexical units - if this term can be used with affixes - that are recorded in the dictionaries. 220 After all, in these cases, the affix alone can make all the difference between consociation and dissociation. 221 Of course, the lexicographers’ decisions underlying the establishment of lexical units in the dictionary may also be relatively subjective - otherwise, cases such as fusion, where OALD and LDOCE differ in the number of lexical units they assign to the Design of the research project 98 The second principle deserving mention is the principle of maximal consociation, which is followed in the case of conflicting analyses. This ensures that all items are treated similarly benevolently in order to objectivise the comparison between English and German, which is the central aim of the present study. If several justifiable alternatives suggest themselves in the analysis of a particular item, the one reaching the highest code is chosen. The status hierarchy relating to motivatability is based on the following principles: 1. Semantic similarity between complex word and constituents has highest priority. 2. Formal similarity between complex word and constituents, and stylistic unmarkedness of constituents are both very desirable. Their order may be decided individually depending on the word in question. 3. Analysability into motivating constituents is preferable to motivation by zero-derivation. The third principle to be discussed concerns linguistic correctness. It is the case more than once that items are synchronically analysable into segments in a way that contradicts the linguistic information at the disposal of the researcher, e.g. in instances of popular etymology. However, as the present study adheres to the principle of maximal consociation described above, and as the linguistically inexperienced language user would not even be aware of the problem, the principle of synchronic etymological competence is given priority over linguistic correctness. Consequently, and in analogy to etymologically correct MPA analyses, an agentive suffix -er is postulated in English father and mother, 222 but only in German Vater - because Mutter is female both semantically and grammatically, which means that the female gender of the corresponding article is not compatible with the masculinenoun-forming suffix -er and would require an additional suffix -in as in Sprech|er|in ‘female speaker’. However, it must be said that the majority of the analyses should also conform to the usual standards of linguistic analysis. 223 lemma, would not be possible -, but it is preferable to rely on a second opinion in these cases. 222 Even though the base cannot be given a precise meaning, the agentive suffix could combine quite understandably with a meaning such as ‘to procreate’. As it is far harder to see in what way Bruder and its English equivalent brother or English sister could be motivated by an agentive suffix, none is postulated here. 223 Moreover, the fact that the data base used in this project including the description of all the motivating constituents is made available allows other researchers to check to what degree they agree with the analyses in the light of the synchronic etymological competence. Analysis of motivatability 99 The frequency lists being contextless, another principle needs to deal with strategies for the disambiguation of homonyms, polysemes and grammatical homonyms. As LDOCE subsumes all homonyms and polysemes with identical part of speech under the same entry, and as the “meanings in the entries are as far as possible ordered in accordance with their frequency” (LDOCE: xi), it is possible to check which of the competing meanings is most central and thus most relevant to the analysis. Unfortunately, though, there does not seem to be any reference work with a similar ordering for the German language. Consequently, the analyst’s intuition is drawn upon in obvious cases such as German beschreiben, which can mean either ‘to describe’ or ‘to write on sth.’, and where different analysts can be expected to have little hesitation in choosing the first meaning as the more frequent one. However, there are other words such as einstellen, which has such varied meanings as ‘to put into’, ‘to hire’, ‘to tune’, ‘to stop’ and ‘to adapt’, and where the tendency is less obvious. Verifying all the contexts of such items in a corpus would be far too time-consuming, but corpus data can still be used to determine the frequency of homonyms, as the detailed lemma list for the IDS Corpora offers a kind of micro-context that has a disambiguating function. Thus the fact that it contains compounds such as Presse-Fuzzi ‘guy from the press’ and Presse-Agentur ‘press agency’, but not Zitronenpresse ‘lemon squeezer’ or Saftpresse ‘juice extractor’ makes it probable that the noun Presse ‘press’ occurs more frequently in its journalistic meaning. 224 Unfortunately, though, some items cannot be disambiguated in this way, so that it is still necessary to fall back on the researcher’s intuition. In these cases, the principle of maximal consociation is applied with the usual restrictions. 225 Grammatical homonyms from the non-PoS-tagged DWDS Core Corpus frequency list constitute a similar problem. 226 Thus, German verhalten can be interpreted as either the verb ‘to behave’ or as the homonymous adjective ‘restrained’, and for technical reasons a decision has to be taken as to what is the better alternative. Again, the intuition of the analyst is comple- 224 It cannot be excluded that these compounds are very rare and that Presse in its squeezing meaning occurs very frequently by itself but never in compounds, but this is rather unlikely. 225 Interestingly, most of the rather obvious decisions are not in favour of the highest degree of motivatability. This is particularly true in the case of German particle verbs, which have frequently experienced a semantic change from a concrete to a more abstract meaning (pointed out by Mechthild Habermann). Thus, even if umsetzen in the fully motivated meaning ‘to put/ seat someone somewhere else’ may still be frequent in classroom discourse, the less motivated meaning ‘to put into action’ is more likely in general use. 226 Grammatical polysemes are unproblematic, as they typically take slightly different forms in German due to capitalising, infinitive marking etc. In English, both grammatical homonyms and polysemes are disambiguated by the PoS tagging. Design of the research project 100 mented by the microcontext of the detailed IDS Corpora list and the previously mentioned PoS-tagged DWDS Core Corpus list. Last but not least, it should be noted that most constituents are in their citation form, but that inflected forms are accepted if they lead to a higher level of motivatability, e.g. in the German word sogenannt ‘so-called’, which is analysed into so ‘so’ and genannt ‘called’. Bound grammatical morphemes are also accepted as constituents of the analysis items. 227 3.4 Analysis of expandability Once the motivatability of a word has been determined, its expandability is encoded. The procedure employed to this end differs from that used for the opposite direction of consociation particularly in the degree to which reference works are consulted. 3.4.1 Sources While the analysis of motivatability has to rely to a large degree on the analyst’s intuition and command of the languages dealt with, the determination of expandability can be operationalised more objectively by accepting as valid expansions only those words which are documented in particular sources. 228 227 Cf. Fleischer and Barz (1995: 4), who draw attention to the fact that inflectional morphemes can disambiguate word stems, e.g. steinigas an adjective in the case of steinig|er ‘more stony’ or a verb in the case of steinig|t ‘lapidates’. That inflectional affixes are useful in the interpretation of words’ role in larger syntactic units such as phrases and clauses can be most impressively illustrated by Lewis Carroll’s (1871/ 1970: 191-197) Jabberwocky poem (cf. Henderson 1985: 33). The approach adopted in this respect has the consequence that an item can be partially motivated by an inflectional affix only, e.g. best by the superlative suffix -st and Leute ‘people’ by its plural -e, even if there is no singular word *Leut in German. 228 It would also be possible to count as expansions all word formations occurring to the analyst, but this would result in a severe loss of precision and objectivity for the following reasons: 1. It is not always possible to think immediately of an expansion for each item, particularly in the case of the functional words, so that a purely intuitive procedure would take a lot more time. 2. If no expansion can be found, this is not to say that there is none in the language, but rather that the analyst either knows none or can remember none at the time of analysis - which means that a very different phenomenon than the one aimed at is actually measured. 3. Also, the intuitive approach may not always yield the best expansion for a particular item. 4. This procedure might be particularly prone to subconscious bias. 5. A purely intuitive approach cannot resolve whether potential expansions are idiosyncratic formations caused by unusual analogies. Analysis of expandability 101 The research method is always very similar due to the fact that all the electronic sources permit the use of wildcards: if an item such as frisch ‘fresh’ is entered between two asterisks standing for any number of any character each, the search yields all possible chains of characters containing the letter sequence in the search area, e.g. Frische, erfrischen, Frischmilch and ofenfrisch. With the computer providing all possible material, a relatively high degree of objectivity and precision should be attained. Nevertheless, errors cannot be completely excluded because the hits are purely formbased and not necessarily semantically related, e.g. in the case of rose, among whose potential expansions are dextrose, heterosexual and macrosegment, so that the data have to be postprocessed by a human researcher. This is particularly difficult in the case of function words, which are not only more vague from a semantic point of view, but also consist of very short and frequently employed combinations of letters, so that the many hits - e.g. the 32,752 different words in which the sequence er, the German word for ‘he’, occurs 229 - can only be skimmed through. Thus, even though reasonable care is exercised, the analyst may occasionally overlook an existing expansion. However, the consultation of more than one reference work in cases where there is no suitable expansion should minimise this effect. 230 3.4.1.1 Dictionaries For both English and German, three monolingual dictionaries of varying sizes were employed in the search for expansions, namely one learner’s dictionary, one medium-sized general dictionary of the contemporary language and one very large reference work. 231 This was done with the aim of establishing a rough indication of the relevance of the expansions that can be found for each word: complex words that are included in the learner’s 6. Such an approach poses the additional problem as to whether potential ad-hoc formations in a second language - in the case of the analyst of the present study, English - should be treated differently from those in the native language - here, German. 229 Source: UW. 230 A further advantage of using more than one dictionary for both languages lies in the fact that the reference works differ with respect to how the hits are presented. For instance, the UW lists all hits individually in their full form, while the Profisuche ‘professional search’ in LGWDaF only indicates the entry under which a potential expansion can be found. This makes it hard to see how the search item Rose ‘rose’ could combine with Sessel ‘armchair’ before clicking on the link and finding out that the hit corresponds to the middle segment of Bürosessel ‘office armchair’. By contrast, this system offers the advantage that it speeds up the search if the entry of the search item contains possible expansions. 231 The Wortfamilienwörterbuch (Augst 1998) is not used in the search for expansions. Even though the Handwörterbuch der deutschen Gegenwartssprache (Kempcke et al. 1984), which supplies most of its lemmas, is a medium-sized reference work, Ickler (1999: 299) and Beersmans and Meijers (1999: 506-507) criticise that lots of contemporary vocabulary is missing. Design of the research project 102 dictionaries are taken to be more central to the vocabulary of the languages than those that can only be found in the larger general dictionaries. 232 Therefore, expansions were first sought in the learner’s dictionaries, and the medium-sized and larger reference works were only consulted if the initial searches yielded no satisfactory results. 233 Even if all expansions recorded in the dictionaries contribute to their respective word’s consociation, this distinction permits the calculation of a central and a peripheral expandability rate based on the size of the respective sources. 234 As far as German learner’s dictionaries are concerned, the limited supply makes Langenscheidts Großwörterbuch Deutsch als Fremdsprache (LGWDaF) a very obvious choice. The precise number of lemmas is not mentioned, but according to Eveline Ohneis from Langenscheidt, the 66,000 entries and expressions mentioned on the cover comprise not only the 34,500 lemmas in the strict sense, which are printed in blue, and the 7,000 run-on derivatives inside the entries, which are in bold print, but also all items in semi-bold italics, which number about 24,500. This last category comprises 1. collocations such as Abscheu/ Hass/ Misstrauen gegen jemanden/ etwas hegen ‘to loathe/ hate/ distrust somebody’ (LGWDaF s.v. hegen) 2. idioms such as einem geschenkten Gaul schaut man nicht ins Maul ‘never look a gift horse in the mouth’ 3. expressions such as so gut wie ‘as good as; almost’ 232 This hypothesis is based on the assumption that the learner’s dictionaries attempt to document the most relevant part of the vocabulary for everyday usage and - as the number of entries by far exceeds the number of words to be learned in foreign language teaching - beyond. If Murray’s famous diagram (Oxford English Dictionary 1989: xxiv) is taken as the basis, this should comprise the core of the common words and stretch out in each of the other directions, namely scientific, foreign, dialectal, slang and technical vocabulary. A further argument for the use of learner’s dictionaries comes from the assumption that their size roughly corresponds to the receptive vocabulary of adult native speakers (cf. Section 6.1). 233 Whether a potential expansion is satisfying or not depends on idiosyncratic factors such as formal and semantic similarity, diasystematic labelling, grammatical polysemy, distribution and potential alternatives. 234 This treatment presupposes that words contained in the smaller dictionaries are also contained in the larger dictionaries. In fact, even if this is not always true, the number of deviant cases is so small as to be negligible, as the following figures, which are based on the analysis of a list of high-frequency words from the IDS subcorpus TAGGED, show: while 6.0% of the words on a test list with over 2,500 items were expandable in the UW, but not in LGWDaF, the reverse was true for only 0.4% of the items. 1.4% that were expandable in both dictionaries, but where the solution from LGWDaF was not contained in the Duden, can be added to the latter result. The symbols ^^ (3x) and ^ (12x) before some words in the expandability column occurring in both the DWDS Core Corpus and in the IDS subcorpus TAGGED relate to this pilot study. Analysis of expandability 103 4. construction patterns such as j-d hat etw. ‘sb. has sth.’ and j-d hat j-n als etw./ zu etw. ‘sb. has sb. as sth. or is in a particular relation to sb.’ (LGWDaF s.v. haben). This category is excluded as a source for expansions because it contains no words in the sense required for the present study. By contrast, the 30,000 equally semi-bold italicised compounds such as Kerzenlicht ‘candlelight’, which are introduced by two vertical strokes ||, count towards the complex words to draw from in LGWDaF, so that the ultimate number of potential expansions totals some 71,500. By contrast, there are several high-standard monolingual English learner’s dictionaries on the market, of which the Longman Dictionary of Contemporary English (LDOCE) and the Oxford Advanced Learner’s Dictionary (OALD) seem to be the most established ones. 235 The decision in favour of OALD is not due to a lexical coverage that would clearly distinguish this dictionary from its competitors, but rather to the publication of the completely new 7 th edition shortly before the beginning of the English vocabulary analyses. In addition, the electronic version of OALD permits the use of wildcards both at the beginning and at the end of words, while LDOCE only allows the second of these positions, 236 which is a serious drawback for the kind of task required for the present study. The cover of the paper version sold in Germany announces that OALD contains 183,500 words, expressions and meanings. At first sight, this figure seems to prevent a comparison with the LGWDaF - if it were not for the fact that it refers to an extremely vague group of items. According to James McCracken from Oxford University Press, the figure can be drastically reduced by considering the lemmas only, as only these are relevant for the present study: then, OALD contains about 32,000 entries plus 5,500 derivative lemmas, which gives a total of 37,500 lemmas. 237 Unfortunately, the number of lemmas in LGWDaF and OALD is not as similar as initially hoped, but aiming at complete coincidence here would have been unrealistic anyway. As both LGWDaF and OALD fulfil the same function in their respective languages in that they are aimed at slightly advanced foreign language learners, their comparison is legitimate nevertheless. In the case of the larger dictionaries, the publishing companies seem to be slightly reluctant to reveal even the approximate number of entries. Both the Shorter Oxford English Dictionary (SOED) and the Duden Deutsches Universalwörterbuch (UW) mention in their prefaces that new words have been added in relation to earlier versions, but none of these sets of words is numerically delimited. The largest figure that can be found in the UW is con- 235 At least in EFL teaching at German schools. 236 This is also true of the slightly revised 4 th edition of LDOCE (2005), which was published almost at the same time as the latest OALD. 237 This count excludes the 2,000 variant entries that are only cross-references. Design of the research project 104 tained in the statement that the roughly 70,000 words of the central German vocabulary are supplemented by more peripheral vocabulary - however large that may be. A search of all letters of the alphabet followed by a wildcard yields 123,772 hits in the UW, which translates as roughly the same number of lemmas due to the dictionary’s lemmatising policy. 238 The SOED CD-ROM’s cover announces sound recordings for over 100,000 headwords, but it is not possible to conclude from this what the proportion in relation to the whole dictionary is. According to James McCracken from Oxford University Press, the SOED can be determined to contain 169,000 lemmas. In spite of these size differences, both dictionaries are in a very similar position from a functional point of view: 1. They are monolingual, general dictionaries aimed at native speakers rather than learners. 2. They attempt to document the contemporary language and consider diachrony only where relevant for today’s language users. 3. They cover the central vocabulary as well as a certain amount of technical vocabulary and dialectal words from varieties outside British English and High German respectively. 4. Both dictionaries are medium-sized with respect to each language’s dictionary scene. At the large end of the scale, the German vocabulary is recorded in the tenvolume Das große Wörterbuch der deutschen Sprache (GWDS, 1999), which contains more than 200,000 lemmas, 239 and the English vocabulary is extensively documented in the Oxford English Dictionary (OED), which comprises more than half a million words. 240 Unfortunately, the differences in size between these two dictionaries are extremely large. Consequently, corpora were drawn upon in order to provide a more comparable largest source, and because they offer some further advantages in lexical research. 238 The numbers can only be approximate because certain entries, e.g. abbreviations such as a.c., are counted twice in the UW. 239 Cf. <http: / / www.bibliothek.uni-regensburg.de/ dbinfo/ einzeln.phtml? bib_id=ub_ a&colors=63&ocolors=40&titel_id=1558>, 23.11.2006. 240 Cf. <http: / / www.oed.com/ about>, 25.05.2005. Analysis of expandability 105 3.4.1.2 Corpora However large dictionaries may be, they only represent a language imperfectly because they merely contain linguistically edited data 241 and treat language as if it were static. By recording only established vocabulary, they disregard the fact that languages are dynamic systems which may produce new words. 242 Therefore, a dual approach is adopted here, which combines dictionary-consulting with the relative dynamics inherent in corpora. The corpus used in the search for English expansions is the British National Corpus. Expansions such as project-based for project are listed with their absolute frequency in the corpus - in this case, that of fifteen. Words occurring more than once in identical passages in the BNC, e.g. suddenlyabandoned in the documents H8N 942 and HJH 947, are treated like all other hits. Usually, the expansion with the highest frequency is chosen. Orthographic variants, e.g. spellings with or without a hyphen, such as ex-employee/ exemployee, are all subsumed under the same lemma in citation form - just as inflected forms. 243 Where the search was overcharged by too many possible hits, e.g. in the case of cos, only expansions with hyphenated spellings could be looked for. 244 The German corpus used in the search for expansions is the DWDS Core Corpus because its properties make it a particularly suitable candidate for comparison with the BNC: while the English and German dictionaries unfortunately differ in their number of entries, the fact that the corpora contain the same number of tokens should result in a very similar probability of encountering expansions in the two resources. Consequently, the inclusion of the corpora can be considered a necessity, as it ultimately makes the English and German sources comparable on the quantitative level. 245 241 Cf. Adams’ (2001: 14) criticism that dictionaries “are likely to give preference to words with something obscure or remarkable about their make-up and to omit regular and transparent items that users will probably not need to look up”. This is also criticised by Finkenstaedt and Wolff (1973: 25), who point out that un-derivations are given only to a limited extent in the OALD - presumably because they are considered unnecessary, as it is possible to decode such forms after having learned the principles of word-formation. 242 Surprisingly, though, the OED includes some lemmas that are labelled as noncewords, e.g. consequenceless. 243 In the case of nouns premodifying other nouns, e.g. non-championship (race), items PoS-tagged as nouns, adjectives or ambiguous are all taken together. 244 This may have prevented the most frequent expansion being found, but it was necessary in order to obtain a result at all. 245 Note, though, that the corpus size is indicated as the number of word forms, not lemmas, which prevents absolute comparability. Design of the research project 106 The principles described above for the BNC also apply to the search in the DWDS Core Corpus. In addition, the frequency count 246 is tolerant with respect to (non-)capitalised spelling variants, e.g. when the noun Sowohlals-Auch is also spelled Sowohl-Als-Auch, Sowohl-als-auch and Sowohl-Alsauch. 247 Unfortunately, though, the search options for the DWDS Core Corpus are not as elaborate as required for a perfect comparison with the BNC: 1. The publicly accessible DWDS Core Corpus allows its users to view only 500 hits. This is particularly inconvenient in the search for expansions, which are of a lower frequency than their bases. However, it is possible to exclude the immense number of unwanted hits - such as the high-frequency bases - by listing individual types preceded by the Boolean operator &&! . Though time-consuming, this filtering method works quite well. 2. Truncation prevents lemmatisation, which means that the search for the citation form and selected inflected variants may overlook important hits. 248 This also implies that a final lemmatised search has to be carried out for the selected expansion in order to include all its inflectional variants in the frequency count. 3. The search mechanism is sensitive both to capitalisation and to the distinction between <ss> and <ß>, so that it is necessary to carry out even more searches for each item, e.g. so as not to overlook possible expansions in which the capitalisation of the search-word’s initial letter is changed. 249 4. It is not possible to use wildcards simultaneously at the beginning and at the end of search words. This limitation is partially overcome by carrying out separate searches, but it makes it impossible to find complex words with the searched-for sequence in their middle, which is a severe drawback. For the above reasons, the search in the DWDS Core Corpus was supplemented by a search in the IDS Corpora, whose search functions are more flexible. As these corpora are about sixteen times as large as the BNC, they are not suited for a comparison with the English corpus, 250 but they can be used in order to find expansions with a search word as the middle se- 246 The search for expansions in the DWDS Core Corpus took place in January 2006. 247 Lauter-Einser-Schüler ‘a straight As pupil’ and the number-based Lauter-1er-Schüler are also counted as instances of the same lemma (source: IDS Corpora; cf. below). 248 Searching for truncated forms with all possible inflectional variants would have been too time-consuming. 249 Thus, German man ‘one’ + wildcard does not find the noun Man-selbst ‘one-self’, which occurs 31 times in the corpus. 250 Counting only IDS expansions occurring at least sixteen times in the corpus may not lead to a comparable situation either. Analysis of expandability 107 quence, which can in turn be searched for in the DWDS Core Corpus. Furthermore, list words for which no expansion could be found in the DWDS Core Corpus were subjected to a search in the IDS Corpora, and the expansions were listed with the corresponding frequency even though they do not count towards consociation in this study. When looking for expansions of English and German words, one may think of considering the Internet as the largest possible electronic source. However, web-based searches have several disadvantages: 1. It is not possible to determine the precise number of words on the Internet, even at a given point in time. Furthermore, the Internet is continually and rapidly subjected to change, so that all frequency indications are immediately outdated. 2. Even though the German and British versions of the search engine yahoo lend themselves to contrastive research - the assumption being that their common origin makes them work in a comparable way -, it is not possible to determine if the parts of the Internet written in German and English respectively are comparable in size. However, one may expect the English part to be considerably larger, which makes comparison between the languages more difficult. 3. Many texts on the Internet, particularly the English ones, are not written by native speakers, which raises the question whether potential expansions by non-native speakers should be considered at all. If the answer is no, this is problematic because no a-priori sorting of hits according to the native language of the speakers is permitted by the usual search tools. 4. A large part of the material is not spell-checked or edited in other ways, so that the results are even more heterogeneous than those from corpora. 5. The same page can turn up repeatedly in the search engines’ hit lists, which means that the same word in the same text may be counted over and over again. 251 6. A comparison between search engines yields extremely different results. Thus, the German version of yahoo counted 40,800 hits for the word halbamtlich, while the German version of google only counted about 1,200 hits in the middle of October 2005. 7. Not only is it impossible to check the hits at the bottom of the search engines’ lists - for instance, google only allows the users to view less than 1,000 hits -, but the number of hits indicated on the top right also changes on the way: thus, the aforementioned 40,800 hits for halbamtlich sank to a mere 28,500 and then rose again to 38,600 on the last possible page. 251 This is already problematic in the corpora, but even more so on the Internet. Design of the research project 108 These results indicate that the Internet cannot be used as a reliable corpus in the majority of linguistic analyses. Yet Landau (1996: 78) points out that in dictionaries, run-on derivatives count as dictionary entries. As these are a selling argument, “every college dictionary includes thousands of rarely used run-on derivatives, such as oppressingly, sluggardliness, and idioticalness” (Landau 1996: 78). The implication is that not all complex words contained in the dictionaries should be accepted uncritically, and that it is desirable to check whether words such as pre-imagination or unincreased are ever used outside the dictionary. The corpus employed to this end should be very large and include technical vocabulary from a variety of domains, among other things, so that the Internet as the largest and most easily accessible corpus world wide clearly suggested itself in spite of its drawbacks. Thus, 95 English and 69 German random dubious dictionary-based expansions were checked in the respective versions of yahoo in October 2005 with the option that only sites in the corresponding language would be considered. Even though absolute frequencies 252 ranged from 20 (Italienischlektor ‘Italian teacher at university level’) via 75 (uncoped) and 21,100 (übergenug ‘more than enough’) to 1,630,000 (unfunny) hits, all complex search items occurred on the Internet, 253 and it was therefore decided that occurrence in any of the dictionaries should be considered sufficient proof of the existence of any expansion. However, the related question of how hapaxes and other low-frequency candidates for expansion from the corpora should be treated remains. 254 On the one hand, it is questionable whether a word occurring only once or twice allows any conclusions to be drawn on the word-family integration of its base, as it will surely be unknown to the majority of speakers of a language. On the other hand, though, it must not be forgotten that every 252 Note that only the citation forms of the words were entered into the search engines. While this omitted English plurals, passives etc. from the counts, the German search engine found not only inflected forms such as halbamtlicher for halbamtlich ‘halfofficial’, but also Eintretensfall ‘the case that something happens’ for the morphologically more distantly related search word eintretendenfalls ‘in case something happens’. 253 Some of the hits actually came from dictionaries as well, namely from the online versions of the Merriam-Webster (<http: / / www.m-w.com>) etc. Others did not correspond to the search word in meaning, e.g. the English noun aimer, which is formally identical with the infinitive of the French verb aimer ‘to love’, present in many book titles etc. The figures are also distorted by the fact that the search engine makes no difference between parts of speech, so that the noun freer seems to be extremely frequent because of the formally identical comparative form of the adjective free. 254 These items pose the additional problem that it is not always easy to determine whether one is dealing with a typographical mistake, particularly the omission of blanks, which yields sequences such as *andmoreover, or intentional creations such as despite-everything or no-one-ever-knows-what in “Naked Gun star Leslie Nielsen is back on another mind-boggling trail in search of no-one-ever-knows-what as bumbling Lt. Frank Drebin in the latest series of Red Rock cider adverts” (BNC: file CH5 658). Analysis of expandability 109 corpus can only represent a tiny fraction of linguistic reality (Greenbaum 1988: 83-84): We cannot expect that a corpus, however large, will always display an adequate number of examples of the phenomena relevant to a particular topic, especially when the phenomena occur relatively infrequently. […] If we are looking at syntactic data, it may be a matter of chance that a particular syntactic feature is absent or rare in our corpus. Only for very common constructions can we be certain of finding adequate evidence. We cannot know that our sampling is sufficiently large or sufficiently representative to be confident that the absence or rarity of a feature is significant. If it is considered that the “major function of the corpus is […] to supply examples that represent language beyond that corpus; the investigator analyses the material to generalize about the language” (Greenbaum 1988: 83), the obvious conclusion to be drawn for hapaxes etc. is that they are indicative of the integration of vocabulary items into word families by means of expansion in spite of their low frequency. The fact that the corpus-based expansions’ frequency is noted down makes it possible to exclude expansions with a frequency below a particular threshold in separate calculations. 3.4.1.3 Source codes In conclusion, expansions are sought in functionally equivalent English and German sources of different sizes. To keep expansions belonging to different levels apart, they are preceded by particular symbols in their respective column of the word list: Table 12: Symbols marking the source of expansions Symbol English source German source (no symbol) OALD LGWDaF ° SOED UW + OED GWDS # BNC DWDS Core Corpus % - IDS Corpora 3.4.2 Expandability codes The analysis of expandability occupies two columns in the data base table, the first one documenting the best expansion for each list item found in the sources described above, e.g. dusty for dust, and the second one summarising the degree of consociation by means of a code. Four basic distinctions are made: words can be Design of the research project 110 1. motivatable, but not expandable - which is encoded with an M for motivatable 2. expandable, but not motivatable - which is marked E for expandable 3. both motivatable and expandable - which is labelled B for both 4. neither of the two - which is represented by the code N for neither. Table 13: Word family integration codes Code Meaning of the code Consociation M only motivatable, but not expandable + E only expandable, but not motivatable + B both motivatable and expandable + N neither motivatable nor expandable - These four basic codes categorise a word as either dissociated, which corresponds to the label N, or consociated - which is the case for all list words classified as either M, E or B. The category B represents the strongest kind of consociation, followed by M and only then by E. 255 As in the analysis of motivatability, restrictions are expressed by the addition of further codes to the basic ones. The supplementary codes attaching to B and E are summarised in Table 14. Most of the codes in Table 14 correspond to those used in the analysis of motivatability, 256 and their function should be clear from the table above and the explanations in Section 3.3.2.2. However, a few notes have to be added for some of the labels, along with some further principles underlying the analysis of expandability. 1. Stress shifts only cause marking as FS if another syllable of the original word carries the stress in the expansion, as in photograph and photographer. 257 When derivatives are stressed on the affix instead of the base, e.g. on the last syllable in interessant vs. on the first syllable in uninteressant, there is also a minor stress difference within the base, as the original word stress is either reduced to a secondary accent or to nothing. Yet this is not considered an instance of stressbased partial expandability because the stress pattern of the original base is not distorted. 2. An attempt was made to find expansions that are not marked in any way. However, when expansions carry a label in the dictionaries, they are encoded with an additional L. Labels include 255 While the motivating constituents are very closely tied to the analysis item, there are usually many different similarly suitable expansions, so that the link is less strong. 256 With the exception of the codes A, R and T, which are restricted to the analytic direction by their nature and do not occur among the expandability codes. 257 In this particular example, there is also an accompanying change in vowel quality. Analysis of expandability 111 domain, e.g. the business term asset-stripping region, e.g. British English charity shop 258 style, e.g. formal maladministration rarity, e.g. rare thereamongst. 259 3. As in the analysis of motivatability, self-postulated labels are recognisable by the code LI. This procedure is particularly important for the corpus-based expansions, which can only be labelled as LI because there is no corresponding L. 4. Some label-bearing expansions are formed on the base of a list word that is marked itself. Thus, the fact that purchaser is a formal word is not surprising, considering that purchase is labelled formal in OALD as well. In such instances, the L-code is written in lower case. 5. In analogy to motivational analysis, expansions marked with a C are formally similar (quasi-)synonyms of their base. Here, the C may be taken to stand for common core rather than for clipping. This category also includes German feminine words, which only differ from the masculine or unmarked word by the addition of the suffix -in, e.g. Außenministerin ‘female Foreign Secretary’. 260 6. If no other expansion for a particular list word can be found in any of the sources, grammatical polysemes are accepted as well, e.g. the noun maybe as an expansion of the formally identical adverb. This category, marked Z for zero-derivation, also comprises English adverbs in -ly, whose uncertain status is discussed in Section 2.3.2. As in the opposite direction of consociation, grammatical words cannot be expanded into other grammatical words to which they are related by zero-derivation. 258 The study is based on the British English variety, e.g. as far as the origin of the word list is considered. Consequently, expansions from this variety of English are marked with an L for reasons of systematicity, but one may argue that this should not count as a consociation-reducing factor (cf. the alternative results in Section 5.2). Where a British English word is matched by a highly similar Americanism, there is no labelling at all, as it is possible to consider this an instance of one lexeme that takes different forms in the two major varieties of English, e.g. swing door/ swinging door. Dictionary labels qualifying an expansion as especially American or British are disregarded as well. 259 Actually, this is the only English expansion classified as rare in the list. The fact that words are labelled as approving or disapproving in the dictionaries is not retained, and potential expansions such as againward (SOED) which are marked as old-fashioned or archaic in any way are not admitted to the list in view of the study’s synchronic character. 260 This treatment was adopted because it is possible to argue that the feminine form, which can be derived from practically all German nouns denoting professions, is only a variant of the general lexeme. Design of the research project 112 Table 14: Supplementary codes for partial expandability Code Example Expansion Explanation +F permit jeweils permission jeweilig partially expandable due to differences in spelling and pronunciation; may involve shortening of the base +FS particular aktiv particularity Aktivität partially expandable due to differences in pronunciation; may involve stress only +FW accompany accompaniment partially expandable due to differences in spelling (only in English) +FV anlegen - Anlage partially expandable due to vowel gradation (only in German) +FVU ankommen - Neuankömmling partially expandable due to vowel mutation (only in German) +# via mögen viaduct mitmögen 261 partially expandable due to semantic obstacles +L notion wiederum notional formal hinwiederum veraltend partially expandable due to diasystematic labelling +LI movement existieren pincer movement koexistieren partially expandable due to postulated diasystematic labelling +C among Außenminister amongst Außenministerin partially expandable into a formally related (quasi-) synonym +Z maybe (adv) betragen maybe (n) Kaufbetrag partially expandable into a grammatical polyseme only or with grammatical polysemy as an intermediate step The labels in Table 14 can be combined with each other. However, there are only few instances of partial expandability due to the fact that the choice is a lot wider than in the case of motivatability: where the motivational analysis is restricted by potential constituents, expandability may choose from an immense number of complex words. 262 Thus, the fact that an expansion is marked can be taken to mean that there is no more suitable item among the potential candidates. 3.4.3 Principles By definition, all lemmas and subentries in the dictionaries are accepted as potential expansions. 263 This treatment has the advantage that open com- 261 Additionally marked with the code L for being marked as belonging to the spoken variety. 262 For instance, about sixty legitimate expansions can be found for the German verb melden ‘report’ in the UW. 263 By contrast, examples, which may be either unmarked or in - sometimes half-bold - italics, are disregarded because it is unclear how established they are, and because the search for them requires additional steps. The situation is slightly more compli- Analysis of expandability 113 pounds are treated in the same way as those written as a single word or with a hyphen. Consequently, it is not necessary to decide for each word whether it is sufficiently lexicalised to merit being treated as a valid expansion 264 - an aspect that is especially important in the analysis of English, 265 particularly in lemmatised combinations of the type adjective + noun, e.g. absolute zero, whose status is unclear. Lemmatised multi-word compounds such as pay-as-you-go are also admitted. 266 However, this procedure also raises the question whether it makes sense to accept all lemmatised items without exception, e.g. in the case of 1. proper nouns such as Salvation Army 2. pronouns followed by an apostrophe and the short form of a verb, e.g. contractions such as I’ll or you’d 3. lemmas that are actually only the long form of acronyms and serve as cross-references. 267 For the sake of systematicity, entries such as those mentioned above are excluded as expansions. All other criteria being equally met, derivatives are preferred over compounds, one-word expansions over those spelled with hyphens, and hycated in the case of the SOED’s subentries section, which may contain phrases, combinations and special collocations such as technical difficulty. Here, only items belonging to the latter two categories are accepted as expansions because they are spelled in bold print and defined as compounds in the SOED’s guide to the use of the dictionary - in contrast to the phrases, e.g. below ground or soon afterwards, which are in italics, and about whose status as words nothing is said. 264 It is frequently claimed that compounds without an intervening space are more established (cf. Quirk et al. 1985: 1537). However, this cannot explain variation such as that in the spelling of stomachache (LDOCE) vs. stomach ache (OALD). In any case, one must not forget that many words that are spelled as a single word in the contemporary language have been through the two-word stage as well, e.g. forever, which still has an alternative spelling with a gap, or informal oughta ‘ought to’ and kinda ‘kind of’. 265 However, this phenomenon is not restricted to English. There are several German list words for which only expansions written as two orthographic words can be found in the sources, e.g. ebenso gut ‘equally good’ or lieb haben ‘to like’. 266 By contrast, phraseologisms such as either … or and ganz und gar ‘absolutely’ as well as the English phrasal verbs are a priori excluded. After all, neither LDOCE nor OALD lemmatise phrasal verbs such as to look up or phraseologisms. Furthermore, the latter are on the borderline between lexicon and syntax and do not qualify as lexemes. In addition, there is no generally applicable retrieval method for phraseologisms from an operational point of view. Consequently, even if they may contribute to a word’s consociating force, they are excluded from the group of possible expansions. As Chapter 4 and Chapter 5 will show, this has no deep impact on the English as compared to the German results. 267 By contrast, lemmatised items that correspond to the long form of acronyms are accepted if they are primary, e.g. attention deficit disorder, where the acronym takes the form of a cross-reference and the item is explained under the long form. Design of the research project 114 phenated words over compounds with an intervening gap. The principles described for the dictionaries are, mutatis mutandis, also applied to corpora. Once more, the principle of maximal consociation is applied - even in a twofold manner: the expansion should correspond to the highest possible level in the smallest possible source. 268 To this end, the codes summarised in Section 3.4.2 are arranged in a hierarchical order. Semantic similarity is the ruling principle, which overrides all other considerations - including the one relating to the size of the source: if no semantically unmarked expansion can be found, sources of increasing size are considered. The same is true of items that are marked as obsolete, as these no longer belong to the synchronic vocabulary in the strictest sense. 269 Items marked as rare and grammatical polysemes are avoided where possible, but accepted if no better expansion can be found in the larger sources. By contrast, formal differences, particularly those affecting only the pronunciation or the spelling, and labels other than obsolescence are usually tolerated, so that no larger source is consulted. 270 Minor formal differences are caused by orthographic variants, some of them - at least in German - attributable to recent spelling reforms, e.g. Phantasie/ Fantasie, or Tip/ Tipp. As these variants coexist in the corpora and in the language, but not always in the dictionaries, 271 orthographic variants are treated as if there were no difference, so that Phantasie ‘imagination’ can be expanded into fantasievoll ‘imaginative’ without being marked as FW. The exceptions described in Section 3.3.2.2 also apply here. Expandability deals only with words that are morphosemantically related to the target word because little would be gained by the inclusion of transparent pseudo-expansions such as respectively/ irrespectively: in contrast to the study of transparent pseudo-constituents, which ensue from the material to be analysed and may offer insights into the field of mnemonics, the nature of expandability as an open phenomenon would require a systematic search of the entirety of complex words, while offering only few relevant results. 268 On a purely formal level, this means that the expandability code should be as short as possible - ideally, it should consist of a single letter. Therefore, Stadt ‘town, city’ is not expanded into the derivative städtisch ‘urban’, which would correspond to +FVU, but rather into the unlabelled compound Hauptstadt ‘capital’. 269 However, they do in the largest sense, as they may be known to speakers of a particular language from its literature, e.g. from fairy tales. 270 Where formal obstacles and labelled alternatives conflict, unlabelled alternatives (with the exception of marking as British English) are preferred over words with either spoken or written obstacles - thus, plenty is expanded into orthographically slightly different plentiful instead of formal aplenty -, but not over words involving changes in both varieties. 271 Thus, LGWDaF only contains Tipp, but not the former spelling variant Tip - in contrast to the UW. Etymological analysis 115 3.5 Etymological analysis The definition of dissociation and its paraphrases in Chapter 1 exhibit a strong tendency to relate the phenomenon to the origin of the vocabulary: dissociation is regarded as the result of foreign linguistic influences on the native stock of words of a language. 272 However, even if this may seem more or less plausible with regard to the few selected examples that are usually mentioned in the literature, none of the previous works dealing with motivation has devoted a largescale study to the relation between the motivation of particular words and their origin. Since dissociation in its strictest sense has not received the attention it deserves so far, an empirical study considering expandability as the synthetic direction of consociation and its relation with etymology as well could be expected even less. Accordingly, the present research attempts to fill this gap by correlating the results obtained in the study of dissociation with the etymology of the analysis items. The most important question to be answered is whether there is a direct relation between a word’s etymological origin and its integration into the vocabulary of a particular language. Based on the literature about dissociation, 273 it is possible to set up the hypothesis that words of Romance origin are less well integrated into Germanic languages than the native vocabulary of these languages. If this hypothesis is true, a significant difference in both analysability and expandability should be exhibited between words of Romance origin and words of Germanic origin in both English and German. Therefore, the etymology of all English and German analysis items was determined. 274 3.5.1 Sources The etymological information is extracted from electronic dictionaries. For reasons of consistency, the few instances where reservations such as probably or perhaps are part of the etymology are treated as if there were no restrictions. 275 The etymological information for English is extracted from the electronic version of the SOED, which, by including etymological explanations or 272 See Baugh and Cable (1935/ 1993) for a detailed account of the history of the English vocabulary. 273 Cf. Chapter 1. 274 The fact that the initial letters of the words to be analysed cover the whole of the alphabet is particularly important: according to Scheler (1977: 103), loan words in English begin mainly with the letters A, C, D, E, I, N, O, P, R, S, V, X and Z, so that the consideration of words beginning with only some letters of the alphabet would distort the results. 275 Examples are dark and boy (cf. SOED). However, a characterisation such as “obscurely related” or “doubtfully connected”, e.g. in root and struggle, is not accepted (cf. SOED). Design of the research project 116 links for each entry, constitutes such a vast source of information that the consultation of the larger OED is rendered superfluous. Of course, purely etymological dictionaries, such as Onions (1966), or Ayto (1990) contain more etymological details, but the information obtained from the general dictionary is sufficient for the kind of questions addressed here. A similar approach was adopted for German: Paul’s Deutsches Wörterbuch (2002), one of the standard works in the etymological domain, contains less information on the borrowing history of individual words, and is less easily accessible than its competitor, Kluge’s Etymologisches Wörterbuch der deutschen Sprache on CD-ROM (2002). However, Kluge (2002) has a different disadvantage: while containing a vast amount of regional, obsolete and domain-specific words such as Zaine ‘basket’, Aar ‘eagle’ and Köte ‘back part of the toe of horses and cattle’, many of the words, particularly derivatives, from the high-frequency lists are not really documented. Either they are not included in the dictionary at all or they are only entered in the running text of the entry of their base word, even if an idiomatic component is involved. 276 For these reasons, and in analogy to the English part of the study, the majority of the etymological information for German was not drawn from specialised publications, but from a general dictionary. Even if the information that can be found in the Universalwörterbuch is more fragmentary than that in Kluge, it serves the purpose of this study reasonably well - even slightly better because 1. the UW on CD-ROM contains practically all items from the frequency list 2. vital information can be accessed more quickly owing to the fact that square brackets delimit the brief etymological section 3. it offers more convenient search facilities 4. complex words are treated as lemmas in their own right, many of which are supplied with a short piece of etymological information. Nevertheless, Kluge is drawn upon as well in order to complement the UW where necessary, e.g. in the case of einsam ‘lonely’, where it yields more precise results, and in the case of damals ‘then’, where the UW lacks etymological information. 277 276 This is particularly user-unfriendly in the case of prefixations, whose etymology is included under the base and can therefore only be accessed after a prior analysis or via the full-text search, which may result in many irrelevant hits. 277 In the rare cases where the two dictionaries offer conflicting analyses, e.g. for German Truppe ‘troop’, an individual decision is taken. In the present example, the Romance etymology was chosen because the Germanic ancestral stages are less clear. Etymological analysis 117 3.5.2 Word origin Two kinds of etymological information are encoded in the present study: on the one hand, word origin in terms of language-group affiliation, and on the other hand, the words’ period of first attestation. In order to operationalise the information on the origin of the list items, six categories are set up, namely 1. Romance 2. Germanic 3. Germanic-and-Romance 4. Unknown or undocumented 5. Eponymic 6. Excluded because of a more exotic origin. Such a reduced range of categories may seem surprising, but it can easily be justified. Thus, the 1996 version of the SOED distinguishes an impressive range of sublabels for Latin alone, of which the latest version retains the following: Anglo-Latin, Christian Latin, Classical Latin, Ecclesiastical Latin, Frankish Latin, Gallo-Latin, Iberian Latin, Late (Ecclesiastical) Latin, Late Latin, Latin, Law Latin, Lombardic Latin, Medical Latin, Medieval Latin, Old Latin, Popular Latin, Post-Classical Latin, Pseudo-Latin. The Universalduden distinguishes several subtypes of Latin as well: altlateinisch, alchimistenlateinisch, kirchenlateinisch, klassisch-lateinisch, lateinisch, latinisiert/ latinisierend, mittellateinisch, neulateinisch, spätlateinisch, vulgärlateinisch. However, it is difficult to believe that this subcategorisation, which mixes criteria with regard to period, region and register, can always be consistent. 278 Consequently, it seems appropriate to avoid too detailed distinctions - even more so as the categories in the different dictionaries are not directly comparable for several reasons: 1. There are different numbers of categories involved for English and German. 2. As the principles for their establishment are not available, even those categories which are superficially similar - e.g. Ecclesiastical Latin and kirchenlateinisch - may actually be rather different. 278 This suspicion is confirmed by Finkenstaedt and Wolff (1973: 29), who find that “the code ‘late ME’ [= Middle English] is used only from the letter M onwards” in the SOED. However, they use the 1986 version of the SOED, which means that their conclusions cannot be directly applied to the latest edition of the dictionary, as major changes may have taken place in the meantime. Design of the research project 118 3. Even seemingly corresponding pairs presumably cover different parts of the vocabulary for structural reasons. 279 This justifies the use of broad categories, which are nonetheless detailed enough to tackle the central research questions. The Romance items are relatively easy to define: they comprise words that have entered the English and German languages via Latin or any language derived from Latin, and they are marked with the letter r for Romance. 280 While English and German are treated identically on the highest level of analysis, an additional distinction is observed for the English words only: as it is sometimes claimed that in English, direct loans from Latin are situated on a higher stylistic level than loans from French, 281 words of Latin origin might be even more subject to the phenomenon of dissociation than borrowings from French. Consequently, where the information is unequivocal, 282 the French and Latin origin of the Romance list items is distinguished with the additional codes f for French and l for Latin. 283 279 According to de Saussure, the value of a sign in a system is defined by its opposition to other signs (de Saussure 1916/ 1960: 179). This can also be applied to categories: as the category Ecclesiastical Latin stands in an opposition to more categories than kirchenlateinisch, it would cover less words if both were applied to the same items. This effect is further reinforced by the fact that Ecclesiastical Latin directly contrasts with Christian Latin and Late (Ecclesiastical) Latin, the three of which one may roughly assume to share among each other the words covered by kirchenlateinisch in the German classificatory system. 280 Leisi (1999: 41-58) implicitly counts French borrowings among the Latinisms, as two of the eight examples he gives for Latin words in English, namely tolerance and friction, have actually taken the detour via French. This allows the conclusion that he uses the term Latinism in a sense corresponding to the expression Romance item in the present study. 281 According to Crystal (1995: 48), Germanic, French and Latin words expressing the same fundamental notion in English tend to differ in their style, with the Germanic word being the more popular one, while the French borrowing tends to be more literary and the Latin one has a more learned feel to it. Similarly, Görlach (1974: 105) opposes low-frequency non-core Greek and Latin loans with completely integrated French high-frequency vocabulary in the English language. Also consider the (quasi-) synonymous lexical triplets in Crystal (1995: 124), which differ in origin and style, e.g. Germanic rise, French mount and Latin ascend. However, it has to be said that stylistic marking in LDOCE is not conclusive, as even several lexical units of Germanic rise are marked as formal or even literary. From a quantitative point of view, though, the Latin-based word is most extensively labelled. 282 It is not always possible to say with complete certainty whether a particular word has been borrowed directly from Latin, or whether it has taken a detour via French, e.g. in the case of evident (Onions 1966: introduction). Such instances are only marked as Romance. 283 The rare instances of French loan words in English that do not have a Latin but a Germanic origin are marked as Germanic with an intermediate French step, gFG. Etymological analysis 119 The delimitation of the category of Germanic words is more difficult to establish. For instance, Finkenstaedt and Wolff (1973: 121-122) define Germanic words in English ex negativo: The definition of a good Anglo-Saxon (or Germanic) word can therefore only be a negative one: it is a word attested in the OE [= Old English] or ME [= Middle English] periods which is not Latin, or Greek, or French, or Celtic, or Old Norse, or derived from a name, or of unknown origin. Even though this working definition disregards possible loans from other Romance languages or even more distantly related ones, a brief look at Italian loans in the periods mentioned 284 shows that Finkenstaedt and Wolff’s definition represents a good approximation. Nevertheless, it cannot be used in the present research, as it disregards Germanic words that have entered the English language since 1500, mostly compounds and derivatives. Therefore, in both English and German, a word will be assumed to be of Germanic origin and marked with the code g if the etymological information in the reference works does not relate the word in question or any of its constituents to Latin, languages derived from Latin, Greek or other languages with the exclusion of English, 285 German, “Dutch, Flemish, Frisian, the Scandinavian languages, and the earlier languages from which they have developed” (SOED s.v. Germanic). 286 The definition of the remaining etymological categories is facilitated by the fact that it can draw on the two basic categories laid out above. Thus, the category Germanic-and-Romance comprises complex items that simultaneously contain at least one constituent from each of the two principal categories. This quality is reflected in the code b for both, which is applied to derivatives such as around and staatlich, but also to compounds such as gentleman and Zeitpunkt. In the following cases, though, the mention of Romance in the corresponding etymology section does not result in the categorisation of an otherwise Germanic item as b: 284 According to Finkenstaedt and Wolff’s definition, these would have to be classified as Germanic as well, but there are only few in the SOED, e.g. ducat and tuba, so that this would not have had a major impact if the definition had been used in the present study. 285 In spite of the discussion whether English can be called a Germanic language due to its many Romance loans (cf. Görlach 1986, “Middle English - A Creole? ”), it will be considered Germanic in the present study in accordance with the classification in Bußmann (2002: 190). 286 Of course, this is also only a working definition with no claims for universal validity. However, it proved to work well for the present study, as no list items came from the smaller Germanic languages (cf. Bußmann 2002: 250-251). The verb jump, for which only an onomatopoeic origin is indicated in SOED, and the interjection ha, which is only qualified as ‘natural’, are also classified as Germanic. Design of the research project 120 1. mere mention of related words in other languages without a direct derivative quality, as in alt and its translation equivalent old, which are both related to Latin alere in the respective reference works 2. loan translations from Romance languages, e.g. in German Geburtstag ‘birthday’, which is a loan translation of Latin dies natalis 3. Romance influence that is restricted to semantics, e.g. in the case of ablehnen, where the development from ‘take away something that is leaning against something else’ to ‘decline’ may be due to Latin d clin re (cf. Kluge s.v. ablehnen). 287 Apart from these three major categories, a few smaller ones were set up for the sake of completeness. For instance, if the sources state that the etymology of the item in question is unknown, it is labelled u for unknown. This is the case of the words big and bird, among others. It may also happen in both German and English that etymological information is marked as uncertain in the dictionaries, e.g. in the case of Kampf ‘fight’, which is assumed to stem from Latin campus. However, as there are hardly any instances where the reference works hesitate, such etymologies are adopted nonetheless. There is only one English and no German word going back to a proper noun: guy, which can be traced back to the name of Guy Fawkes (cf. SOED s.v. guy), is marked with the code n for name. The last category can be basically defined by exclusion: if an item is neither Germanic, nor Romance, nor both, nor of unknown origin, nor eponymic, it belongs to the category of excluded words with a more exotic origin, which are only of minor interest to the present research. Typical examples are Grenze ‘border’, which comes from Western Slavic, and car, which is of Celtic origin. 3.5.3 Period of first attestation Section 3.5.2 describes how the origin of the Romance words in the English language is further distinguished in order to measure internal variation in the degree of dissociation. Comments by Leisi suggest the necessity for yet another variable, namely the period of first occurrence - or rather, attestation, which is more reliable: even if Leisi (1999: 48) applies the expression hard word to Latinisms and similar words, 288 not every Romance item is necessarily a hard word. Particularly the early Greco-Latin and French borrowings such as dish, cheese and paint are now everyday words whose foreign origin is hardly recognisable (Leisi 1999: 41). Hard words in the 287 Of course, the same applies to Romance words with additional Germanic information in the etymology section as well. 288 Cf. Leisi (1999: 48): “Für die Latinismen und ähnliche Wörter wird deshalb gern der Ausdruck ‘Hard Words’ gebraucht”. Etymological analysis 121 strictest sense, however, cause difficulties for many speakers of English, and they are more likely to be found among more recent Greco-Latin and French loans, especially those that entered the English language after the Middle English period (cf. Leisi 1999: 42-49). Accordingly, it makes sense to correlate the period of first attestation and the precise origin of the list words with their consociation. 289 For the reasons outlined above, rough distinctions serve the purpose of the present study sufficiently well, so that only the three periods Old English, Middle English and later are distinguished and represented by the three codes o, m and l in a separate column. When a word’s period of first attestation can be neither found in the reference works nor deduced, 290 it is marked with an x in the corresponding column. By contrast, no distinction in terms of period is made for the German data, for even if analogous terms such as Old English and Old High German convey the impression of covering more or less the same span of time, differences increase with the subsequent periods, as can be seen from Table 15. 291 Table 15: The periods of the English and German languages English periods German periods Old English 292 450/ 700- 1100 Althochdeutsch 8 th century - 1100 Middle English 1100-1500 Mittelhochdeutsch 1100-1350 Early Modern English 1500-1700 Frühneuhochdeutsch 1350-1600 Modern English 1700- Neuhochdeutsch 1600- A contrastive etymological analysis comparing the earliest period of attestation of the words on the list and making comparative statements about the degree of consociation in the corresponding periods would therefore be highly questionable. In addition, many German complex words are not provided with any kind of etymological information, so that a periodbased comparison would have to remain incomplete for many items. 289 For the sake of completeness, the period is not only encoded for the Romance, but also for the Germanic English items. 290 Cf. Section 3.5.4. 291 Cf. Herbst, Stoll and Westermayr (1991: 240) for the English and Kluge (2002, CD- ROM, Section 11.1.) for the German data. 292 The two alternative figures stand for the arrival of the Anglo-Saxons and the beginning of the literary tradition on the British Isles respectively. Design of the research project 122 3.5.4 Principles A few principles underlying the encoding of etymological information deserve further explanation. For instance, when words have been through several successive borrowing stages, the question that arises is whether the proximate or the original language group (cf. Hillebrand 1975: 224) should be encoded. This becomes an issue in the following cases: 1. English or German words borrowed from Germanic or Romance languages may have entered the language under consideration as loans from non-Germanic and non-Romance languages. Thus, the English borrowing magazine from French can be traced back to Arabic via Italian. 2. English being generally classified as a Germanic language (cf. Bußmann 2002: 190), Anglicisms in German could be classified as Germanic by definition - however, due to the high percentage of Romance words in the English vocabulary, many loans from English are ultimately of Romance origin, e.g. Sex ‘sex’. 3. When a word such as dauern ‘last, take time’ is borrowed either from Middle Low German or Dutch (cf. UW s.v. dauern), but goes back to Latin durare in the last instance, it has to be decided whether the intermediate step should lead to a classification of the word as Germanic or as Romance. 293 Eventually, practical limitations suggest an approach that concentrates on the original language group: 294 as the etymology of many German words cannot be found in the dictionaries, a number of items that is by no means negligible has no officially documented proximate language. Even though it is possible to reconstruct an etymology from the constituents of these words, this only corresponds to the original language, but not necessarily to the proximate one. 295 Therefore, it seems that an approach considering the original language of the constituents of each word comes closest to the aim of identifying the etymology for as many words as possible. However, the original-language approach for the English words is supplemented by a 293 The definition of Germanic words in Section 3.5.2 has the disadvantage that borrowings from Germanic languages are not recorded. However, even if systematic data is unavailable, the experience gained in the encoding procedure shows that there are only very few such cases. 294 The etymologies are not pursued down to the level of Indo-European, where differences would be blurred. 295 Even if it is highly probable that words consisting of Germanic elements only were formed in the Germanic languages themselves, Greco-Romance constituents may actually have been assembled in a Germanic language and borrowed as a whole: consider Finkenstaedt and Wolff’s (1973: 138) statement that the English word protoplasm is “a good example of a ‘modern’ German word”. Etymological analysis 123 proximate-language approach that is based on the combination of codes. Unfortunately, though, the microstructure of the dictionaries, particularly the German ones, may induce certain mistakes with respect to the originallanguage approach. 296 For instance, the UW records only Middle and Old High German preliminary stages for Feiertag ‘holiday’. As one is inclined to believe that the etymological entries include all the information that is necessary to categorise a word correctly, and as the word exhibits no overtly alien features (cf. Munske 1983) and is old enough to belong to the core of the Germanic vocabulary, the obvious conclusion would be that the compound is of Germanic origin. The fact that Feier is actually a loan from Latin is not indicated in any way. Quite similarly, demokratisch ‘democratic’ is stated to be of French ancestry, with démocratique as its predecessor. However, the clearly discernible -isch is actually a Germanic suffix. Regardless of whether the affix should be historically regarded as part of the borrowed word, which has then undergone formal changes in analogy to the Germanic suffix, the principle of the original language of the synchronically discernible constituents is chosen in order to standardise the arbitrary treatment in the reference works. 297 Consequently, both Feiertag and demokratisch are classified as Germanic-and-Romance, and the etymology of the constituents of compounds and suspicious derivatives is checked in the reference works as a rule. Another issue to be considered is that a considerable proportion of Latin words are actually derived from Greek, e.g. Latin systema ‘system’, which has entered German as System (UW s.v. System). Due to the fact that Greek as the word’s original language is not part of the Romance language group, a strict definition of Romance would exclude such words as System. However, such a treatment is highly counter-intuitive, Greek being involved so repeatedly within and across European languages. As Leisi (1999: 44) counts Greek loans among the Latinisms, the present study follows him in treating words of Greek origin like Romance words and in labelling them with the usual r for Romance. 298 However, the additional code k 299 marks them off as Greek. Where originally Greek words have entered English via French and/ or Latin, the corresponding codes F, L and K are entered in capital letters, which permits a separate sorting in the pro- 296 However, the risk of overlooking problematic cases is reduced by the fact that the word lists are dealt with in alphabetical order. 297 After all, some analogically constructed items such as the adjective moralisch ‘moral’ are lacking an etymology in the dictionaries, and the derived etymology cannot but treat Moral as Romance and -isch as a Germanic suffix. 298 If a word consists of originally Greek constituents and other Romance elements, this is also marked as r only, e.g. in television. 299 The letter <k> is chosen because all other letters contained within the word Greek are already used in the other etymological codes. Design of the research project 124 grammes Microsoft Word and Microsoft Excel. The code rk in its pure form is used 1. if a borrowing is directly from Greek, as in the case of technology 2. if it is unclear from which Romance language an originally Greek word has been borrowed into English, e.g. in the case of theatre 3. if an originally Greek word has entered English via a Romance language other than Latin or French, e.g. Italian scope. As mentioned before, no etymological information is provided for some of the English 300 and many German complex words. One way of dealing with this problem is to omit the respective words in the etymological analyses. However, as long as the link between a complex item and its presumed constituents is as strong as that between Überraschung ‘surprise’, überraschen ‘to surprise’ and the noun-forming suffix -ung, there can be little doubt that the complex word’s etymology can be seen as a combination of the etymology of its constituents. 301 It is also possible to deduce the period of English words in cases where the base belongs to the most recent category. Thus, as expect is not attested in Middle English or earlier, it is extremely unlikely that expected should be older, and it can therefore be labelled as belonging to the category l. Furthermore, some words that have no period label in the dictionaries can be classified on the basis of the date of the oldest quotation in the OED, e.g. bathroom, which is first mentioned in 1780 and thus classified as l. Nevertheless, the deduced etymologies are kept apart from those documented in the scientific reference works by the addition of a minus sign. However, if an etymology is provided for only one of two or more German grammatical polysemes, the data is transferred to the other parts of speech without any marking. 302 And conversely, the etymologies of English words that are only mentioned in the entry of their base are labelled with a minus sign unless there are links to all constituents or equivalent information. 300 While all derivatives at the end of standard entries and in most combining entries in the SOED are dated (cf. the Guide to the use of the dictionary on the CD-ROM), only derivatives occurring as lemmas are explicitly provided with the kind of etymological information needed for the present study - though sometimes only in the form of links to their constituents, as in the case of percent, which is related to per and cent. 301 For English, this task is considerably simplified by the fact that words requiring the deduction of their etymology are situated at the end of entries in the SOED representing one of the constituents already, so that the remaining part can be found more easily. 302 Thus, Verbrechen ‘crime’ is classified as Germanic in accordance with the etymology of the verb verbrechen in the UW. However, items whose etymology is based on words to which they are related via vowel mutation or vowel gradation, e.g. ausdrücken ‘to express’/ Ausdruck ‘expression’, are supplemented with the usual minus sign. Etymological analysis 125 Another principle that deserves explanation is the etymological encoding of complex words. It is obvious that monomorphemic words are represented by a single etymological code in an approach which is based on the etymology of the constituents. That the same applies to complex items is maybe less obvious, but equally justified: finer distinctions would complicate matters unnecessarily. For instance, how many constituents a complex word has and what their order is is irrelevant for the research question. 303 What matters is the language group(s) a word is rooted in. Furthermore, the single-code treatment makes it possible to get around the problem of how to handle compounds consisting of complex constituents. If all constituents come from the same language or language group, the code is g for Germanic, r for Romance or e for excluded. If the word in question has both Germanic and Romance constituents, the code is b, and if several constituents from two or more different language groups are mixed within the same word, it is classified as belonging to the category with the highest status: if one of the components is excluded, the whole word is classified as e. 304 If one of the components is unknown, e.g. in the case of funny, or if the dictionaries hesitate between different language groups, e.g. in the case of boot, the code applied is u for unknown. However, the use of a single code in the etymological analysis has the disadvantage that the varying complexity of the items is not recorded anywhere. 305 Consequently, compounds are marked by the addition of a capital C for compound in a separate column, which makes it possible to observe how this subgroup behaves in terms of etymology and consociation. 306 The issue of homonymy, which has already been mentioned with reference to motivatability and expandability, is also present in the etymological domain: in both English and German, a number of words of Germanic and Romance origin happen to be formally identical in the contemporary lan- 303 There is little to be gained by recording whether the constituents of b-labelled items are arranged in the order Germanic-Romance or Romance-Germanic, or even Germanic-Germanic-Romance, Germanic-Romance-Germanic or Romance-Germanic- Germanic. 304 A particularly nice example is German Pferd ‘horse’, which comes from medieval Latin par(a)veredus, itself a composite of Greek paraand late Latin veredus, the second of which is ultimately of Celtic origin. 305 With the exception of the codes UN and UT, which mark synchronic morphosemantic unmotivatability, the other motivational categories can refer to structures of varying complexity. 306 It follows from the nature of synchronic compounds that they should be analysable. However, there may be differences in the degrees of analysability and expandability. German particle verbs (cf. Bußmann 2002 s.v. Partikelverb), which are on the borderline between prefixation and compounding, and words containing affixoids (cf. Bußmann 2002 s.v. Affixoid, Präfixoid and Suffixoid), i.e. historically free morphemes that have acquired a more or less suffix-like status, are not counted as compounds, but rather as derivatives. Design of the research project 126 guage. Thus, the English word ball can designate either a toy or a social event. While the first of these readings is of Germanic ancestry, the second is based on a Romance loan word. However, homonyms with different etymological origins are unproblematic because they are treated like other homonyms in that the reading present in the analyses of motivatability and expandability is the one for which the etymology is recorded. In addition, homonyms whose readings are on a comparable level of probability are marked with the extra code h for homonymic in the word-origin column in order to see how frequently this phenomenon may occur. 307 Where homonyms originate in different periods, the period column is additionally labelled h as well. Last but not least, it is worth mentioning that the etymology of the bound morphemes is included in the affix lists in Section 4.1.4.2 and Section 4.2.3.2. Affixes and affixoids whose etymology is not listed anywhere are assigned the etymology of their grammatical polysemes, so that the prefix abis treated like the morphosemantically related ab, and -zeug like Zeug. The etymology of the German inflectional suffixes and interfixes 308 is not systematically included in any of the reference works used here. However, the assumption that they are all of Germanic origin is confirmed by the fact that Fuhrhop (2000: 202) contrasts the linking elements with a foreign element and also by the fact that linking morphemes can often be diachronically related to inflectional suffixes (Bußmann 2002 s.v. Fugenelement). Even if interfixes contribute only little to the meaning of complex words, it should be noted once more that lexical and grammatical affixes have an identical status conferred on them, 309 as the aim of the present study is to determine the etymological origin of the linguistic material that happens to be present in the listed words. In addition, such treatment avoids problems with affixes that are on the borderline between inflection and derivation. 310 307 Thus, the above-mentioned list word ball is marked - in contrast to words such as force, whose Germanic interpretation ‘waterfall’ is far less probable than its Romance reading ‘strength’. 308 These linking morphemes - such as the -sin Geburtstag ‘birthday’ - can be used to link the constituents of German compounds. The English translation interfix for Fugenelement comes from Dohmes, Zwitserlood and Bölte (2004: 205). 309 Cf. Quirk et al. (1985: 43): “By MORPHEME we understand a minimum unit of form and meaning which may be [..] an inflection such as -s (forget + s)”, which implies that bound functional morphemes can nonetheless be considered to carry at least some kind of meaning. 310 For instance, the German suffix -t and the English suffix -ed can be used both inflectionally in the formation of the past participle, and derivationally in the creation of adjectives from verbs. Etymological analysis 127 3.5.5 Etymological codes Table 16 gives an overview of the codes used in the word-origin column of the analysis table. 311 Table 16: Word origin codes Category Code Romance r Germanic g Germanic-and-Romance b Unknown u Eponymic n Excluded e Latin l French f Greek k Homonym h Self-elaborated etymology - The codes used in the classification of the periods are summarised in Table 17. Table 17: Period codes Category Code Old English o Middle English m Later l Undocumented x Homonym h Self-elaborated - The codes h and may be freely combined with other codes. 311 For conventions regarding capitalisation see Section 3.5.4. 4 Results The data presented in this chapter attempts to answer the questions underlying this research project. The most important of these, namely the question whether the English vocabulary is really “antisocial”, is derived directly from Leisi (1999: 51), and can be paraphrased in the terminology generally employed here as: Is the English vocabulary dissociated? In order to measure dissociation, it makes sense to use a reference system - in this case, one implicitly suggested by Leisi as well -, so that the above question can be re-formulated as: Are English words more dissociated than German words? As dissociation can be formulated as the absence of consociation, and as consociation offers internal variation, it is also possible to ask: How important are particular parameters such as formal or semantic restrictions? Are there any marked differences between English and German in this respect? The data base not only encodes the qualitative consociation of the list words, but also contains information from other domains, namely frequency of list words and of corpus expansions part of speech word length etymology source for expansions. These aspects can be correlated with motivatability, expandability and consociation, or among each other, so as to answer questions such as the following: Are Romance words less consociated than Germanic words? Is there a difference in the motivatability of English words of French, Latin and Greek origin? Are long words less likely to be expandable than short words? Are lexical words more consociated than grammatical words? Are there more corpus-based expansions in English or in German? etc. 312 312 Of course there are many more conceivable questions that could be answered with the material in the database. However, as a systematic treatment in which all variables are combined with each other would be far too extensive, the following can only constitute a selection of the results that could in theory be extracted from the database. Results 130 The following sections offer a structured and detailed description of the findings of the research project. The results for both languages are presented individually and in the same order, followed by a contrastive comparison of the most important aspects. While the following sections independently account for the data obtained in the English, German and contrastive studies and discuss potential reasons underlying some of these results, Chapter 5 will discuss the most central aspects in the form of eight hypotheses about English and German. 4.1 British National Corpus 4.1.1 Frequency The frequency of the English items varies greatly, with the most frequent item, the definite article the, occurring 6,187,267 times in the BNC, while there are only 3,320 instances of the two least frequent words in the list, namely the verb guarantee and the noun perception. The average frequency across all analysis items is 7,431. A proportional treatment of the list items in terms of their frequency would give over 1,863 times more weight to the than to perception. As this extremely large variation would distort the association of frequency and other variables, all associations with word frequency are calculated on the basis of the respective items’ rank. 4.1.2 Word length The word length of the BNC list items measured in terms of alphanumeric characters 313 including hyphens is summarised in Table 18. The development is absolutely regular, starting with a very low proportion of only 0.08% one-letter words, rising quickly but constantly to 19.48% items consisting of four characters and then slowly dropping again to 0.32% in the proportion of the longest words. This distribution is reminiscent of that described by Wolff (1969: 203), who observes that the shorter the word-length, the more corresponding types there are - but of course, this is not valid down to the lowest level. The two one-letter words in the list are the determiner a and the pronoun I. At the other end of the scale, there are eight 14-letter words such as representative and recommendation. Four-character words constitute the largest group with 487 items, but they are closely followed by fiveand sixletter words with 430 and 421 instances respectively. In total, words con- 313 In this kind of context, the terms character and letter are used interchangeably for stylistic reasons. British National Corpus 131 sisting of four to seven letters represent 67.00% of all list items. The average word length is 6.18 characters. 314 Table 18: Length of the BNC analysis items in number of characters Number of characters Tokens % 1 2 0.08 2 32 1.28 3 154 6.16 4 487 19.48 5 430 17.20 6 421 16.84 7 337 13.48 8 227 9.08 9 175 7.00 10 111 4.44 11 71 2.84 12 31 1.24 13 14 0.56 14 8 0.32 Figure 1 shows the association between word length and frequency. As expected, the highest-frequency items consist of relatively short words only, while the relative proportion of long words increases with decreasing rank. The words with a length of four to seven characters show a relatively stable proportion relatively soon. 314 If the results of the high-frequency BNC items are compared to those in Finkenstaedt and Wolff (1973: 28), it will be noted that they correspond more or less to those for West’s (1953) General Service List, where four-letter words also represent the largest category, whereas the results for the whole SOED raise the word length of the largest group to eight letters. Results 132 Figure 1: Word length and rank of the BNC items 4.1.3 Part of speech With respect to their part of speech, the BNC analysis items show the following distribution: Table 19: Distribution across part of speech of the BNC analysis items Part of speech List code Tokens % noun n 1215 48.60 verb v 524 20.96 adjective a 369 14.76 adverb adv 206 8.24 preposition prep 55 2.20 determiner det 43 1.72 pronoun pron 41 1.64 conjunction conj 28 1.12 modal verb modal 12 0.48 interjection interjection 6 0.24 infinitive marker infinitive marker 1 0.04 0 500 1000 1500 2000 2500 0.0 0.1 0.2 0.3 0.4 0.5 Rank 1 to 3 4 5 6 7 8 or more British National Corpus 133 As one may have expected from the literature, nouns constitute the largest category, followed by verbs and adjectives. 315 Due to their limited number, the closed-word-class items only constitute a relatively small proportion of the total number of words. However one may expect their proportion to be considerably larger among the first 100 items in the list. Table 20: Distribution across part of speech of the 100 most frequent BNC analysis items Part of speech Tokens determiner 19 verb 16 adverb 15 preposition 13 pronoun 13 conjunction 9 modal verb 6 noun 4 adjective 4 infinitive marker 1 interjection - This hypothesis is confirmed by the data in Table 20. The 19 determiners form the largest category, and prepositions and pronouns also exceed 10 tokens. However, it is the lexical category of verbs that comes second here. 316 Of the adverbial items occurring in ranks one to one hundred, none is of the -ly-type most typically associated with this part of speech. 317 Instead, the list is mainly composed of monosyllabic, monomorphemic items such as then or so. The four nouns are time - which occurs at rank 53 -, year, people and way. As no single part of speech constitutes more than 20% of the types of the 100 first items, the figures in the highest-frequency band are more evenly distributed than in the whole sample. 315 According to Finkenstaedt and Wolff (1973: 74), nouns represent the largest PoS category in the General Service List (49.13%), the Oxford Advanced Learner’s Dictionary (59.14%) and the SOED (60.01%). However, while verbs also come second in the General Service List, OALD and SOED show larger proportions of adjectives - 20.43% and 24.64% respectively. 316 Of course, the primary verbs occurring in this frequency band function as auxiliaries as well, but their double function justifies their inclusion in the category of lexical words. 317 Compare the respective line from an anonymous poem on the parts of speech: “How things are done the ADVERBS tell; / As slowly, quickly, ill or well” (cf. e.g. <https: / / www.happychild.org.uk/ acc/ tpr/ mne/ 0011gram.htm>, 14.10.2006). Results 134 Figure 2: Part of speech and rank of the BNC items: lexical words The left hand side of Figure 2 illustrates that the roughly 250 most frequent words are not representative of the English language as a whole as far as part of speech is concerned. With the exception of irregularities in these highest-frequency ranges, though, the proportion of verbs remains relatively stable at around 20%. Both nouns and adjectives show a more or less regular increase, with the curve for nouns rising much more steeply than the curve for adjectives. The graphs of the lexical categories can be nicely contrasted with the graphs of the functional words recorded in Figure 3, which generally show a clear tendency for a relatively high proportion in the highest frequency ranks and a subsequent decrease that approaches 0% in the lower frequency ranks. 318 Figure 2 shows a very interesting phenomenon: the curve for the adverbs can be considered a mixture between the pattern for lexical and grammatical words, thus underlining the heterogeneous status of this category. 319 While the proportion of nouns and verbs can be expected to remain relatively stable below the ranks in the graph on the basis of their relatively horizontal curves around the rank-2,500 borderline, the slight increase in adjectives seems to be counterbalanced by the slight decrease in adverbs. 318 The abbreviation pr stands for preposition, and pn for pronoun. 319 Cf. Quirk et al. (1985: 438). 0 500 1000 1500 2000 2500 0.0 0.1 0.2 0.3 0.4 0.5 0.6 Rank n v adj adv British National Corpus 135 Figure 3: Part of speech and rank of the BNC items: grammatical words 4.1.4 Morphology The discussion of both compounding and affixation among the list items can be subsumed under the heading of morphology. 4.1.4.1 Compounds Of the 2,500 English analysis items, only 50 are classified as compounds. The remaining 2,450 words are either monomorphemic, motivatable by affixes, or supported by less easily describable forms of morphosemantic motivation. None of the compounds contains a space. On the one hand this is surprising because two-word compounds are a very typical feature of the English language, but on the other hand this can be explained by the tendency of established compounds to be written as one word, an explanation which is also in line with the fact that only two of the 50 compounds, namely long-term and video-taped, are spelled with a hyphen, which represents the intermediate step. 320 320 According to Kilgarriff (1997: 143-146), there are practically no open compounds among the top 3,000 items in the BNC because this requires components with an extremely high frequency. 0 500 1000 1500 2000 2500 0.00 0.05 0.10 0.15 0.20 0.25 Rank pr d pn cj Results 136 4.1.4.2 Affixes Table 21: English prefixes and initial combining forms Item Example Occurrence in OALD Etymology Tokens agriagriculture r 1 centi- (percent r 2 circum- (circumstance (*O) L r 1 co- (colleague r 1 *comcombine r 3 concontemporary (*O) L r 11 de- (decline r 4 deca- (decade r 1 di- (divide (~*O) L r 1 disdisappear r 6 enenable r 7 imimpossible r 1 in- 1 income r 2 in- 2 independent r 2 interinternational r 3 introintroduce (*O) L r 1 midmiddle g 2 mini- (minor r 1 mismistake h 1 outoutside g 5 over- (overcome g 1 phono- (telephone r 1 prepredict r 5 rerecall r 15 sub- (subsequent r 1 super- (supreme r 1 teletelephone r 2 transtransfer r 3 ununable g 6 uni- (unique r 1 *whwho g 5 *withwithdraw g 1 The overwhelming majority of motivatable words in the present study contain affixes of some kind. 321 Table 21, Table 22 and Table 23 list all the affix types involved in the motivational analyses together with an example 321 In the following discussion, the term affix is used indiscriminately not only to refer to prefixes and suffixes, but also to combining forms. British National Corpus 137 word. Only affixes contained in motivatable words are counted. 322 Affixes for which no completely motivatable example word could be found are preceded by an open parenthesis. As mentioned before, affixes that are either only recorded in larger dictionaries or self-elaborated are distinguished from those present in the learner’s dictionaries by means of an asterisk in the first column. Affixes that do not occur in OALD, 323 which is taken as the basis for all analyses involving English learner’s dictionaries, but are recorded in LDOCE, are left unmarked as well because both dictionaries address the same target group. However, such affixes can be recognised by the code (*O) L in column three. In total, 805 words, i.e. almost precisely one third of all list items, are motivatable by at least one of the 109 affix types that can be found in Table 21, Table 22 and Table 23. Ten words contain two affixes, so that the total number of affix tokens amounts to 815. 324 Homonymous affixes are indicated by an elevated figure following the affix. The number of tokens is based on the homonym variant for which an example is given in the second column. 325 Table 22: English lexical suffixes and final combining forms Item Example Occurrence in OALD Etymology Tokens -able acceptable r 9 -age package r 6 -al conventional r 58 -ally specifically b- 1 -ance performance r 9 -ant servant r 6 -ar familiar (*O) L r 2 -ary parliamentary r 11 -ate 1 appropriate r 3 -ate 2 operate r 17 -ation variation r 26 322 Only very few affixes only occur in unmotivatable but transparent words, such as -ish in publish or proin programme. 323 Alternatively, they may occur in OALD, but not with the required meaning, e.g. in the case of di-. This semantic divergence is signalled by the character ~. 324 Those words are century, chemical, incorporate, minor, ourselves, personality, previous, recognise, telephone and themselves. Of course, there are far more items involving more than one affix, but as the approach taken here is of a more or less binary kind, instances such as the adjective educational are analysed into education and -al only, with no further subdivision of the derivative education taking place. Two-affix words are an exception bound to idosyncratic features of the word (cf. Section 3.3). 325 In analogy to the treatment in the dictionaries, no distinction is made between polysemous affixes. Consequently, the tables below only represent a certain degree of semantic precision, which could be increased by distinguishing many more subtypes. Results 138 -ative conservative r 4 -cy efficiency r 5 -dom 1 freedom g 1 -dom 2 kingdom g 1 -ed armed g 19 -ee employee r 3 -en 1 threaten g 1 -en 2 golden g 2 -ence difference r 7 -ency emergency r 1 -ent ancient r 17 -er teacher g 35 -ery gallery r 1 -ette cigarette r 1 -ful powerful g 6 -fy satisfy r 1 -ial commercial r 4 -ian politician r 1 -ible terrible r 3 -ic democratic r 12 -ical historical r 2 *-ice justice r 1 *-iety variety r- 2 -ify identify r 4 -ing 1 meeting (*O) L g 27 *-ing 2 exciting g 14 -ion suggestion r 61 -ise realise r 3 -ism mechanism r 2 -ist artist r 3 *-ite opposite r 1 -ition definition r 8 -itude attitude (*O) L r 1 -ity community r 20 -ive effective r 15 -ly friendly g 7 -ment movement r 27 -ness awareness g 3 *-nomy economy r 1 -ology technology r 1 -or mirror r 12 -ory statutory r 1 *-our behaviour h 4 -ous famous r 8 -ry ministry r 4 -scape landscape g- 1 *-self herself g- 7 British National Corpus 139 -ship leadership g 5 -sion extension r- 3 *-t complaint g 8 -th growth g 10 -tion intervention r 15 -ty beauty (*O) L r 5 *-ual sexual r 4 -ure pressure r 13 -ward forward g 1 -wards (afterwards g 2 -wise (otherwise g 1 -y assembly r 30 Table 23: English grammatical suffixes Item Example Occurrence in OALD Etymology Tokens -er better g 6 -ly greatly g 82 *-n written g 1 -s news (*O) L g 9 *-’s its g 4 *-st most g- 4 *-t joint g 1 Summing up, there are more than twice as many suffixes as prefixes. Lexical suffixes constitute the largest category with a total of 70 types and 610 tokens. Table 24: English affix types and tokens Types Tokens Token-typeratio Prefixes 32 98 3.06 Suffixes total 77 717 9.31 lexical 70 610 8.71 grammatical 7 107 15.29 The most frequent prefix is rewith 15 tokens. Of the suffixes, lexical -ion (61x) is closely followed by -al (58x), but they barely approach the grammatical suffix -ly, the most frequent suffix, which is involved in the motivation of 82 list words. The high token-type ratio of the grammatical suffixes Results 140 shows that on average, each is involved in the motivation of about 15 list items 326 - in contrast to the prefixes, which are re-used only three times. Table 25 lists the affixes that were felt to be needed in the analysis of the list items, but which are missing from the learner’s dictionaries. The abbreviations used in the Affix type column are p for prefix, l for lexical suffix and g for grammatical suffix. Table 25: English affixes missing from the learner’s dictionaries Affix Affix type Example Etymology Tokens *withp withdraw g 1 *-n g written g 1 *-t g joint g 1 *-ice l justice r 1 *-ite l opposite r 1 *-nomy l economy r 1 *-iety l variety r- 2 *comp combine r 3 *-’s g its g 4 *-st g most g- 4 *-our l behaviour h 4 *-ual l sexual r 4 *whp who g 5 *-self l herself g- 7 *-t l complaint g 8 *-ing 2 l exciting g 14 Among the 16 self-elaborated affixes, there are three prefixes and 13 suffixes, nine of them lexical and four grammatical. Only one of these affixes, ing, is used more than ten times, while six affixes only occur once. All in all, 16 affix types and 61 affix tokens are not recorded in the English learner’s dictionaries - which corresponds to 14.68% of all affix types and 7.48% of all affix tokens. 327 4.1.5 Etymology The etymological analysis of the data comprises two variables: etymological origin and period of origin. These are first associated with some of the other variables introduced above, and then with each other. 326 The particularly large number of words containing the grammatical suffix -ly distorts this average figure. 327 These figures are higher than may have initially been expected in view of the high standards found in English learner lexicography. British National Corpus 141 4.1.5.1 Etymological origin Table 26: Etymological origin of the BNC items C ATEGORY C ODE T OKENS % Germanic total 958 38.32 g 908 g- 15 gFG 21 gFGh 1 gFLG 2 gFLKG 1 gh 8 gRG 1 gRGh 1 Romance total 1363 54.52 r 270 r- 2 rf 740 rfh 3 rFK 1 rFLK 27 rGK 2 rh 2 rk 32 rk- 1 rl 250 rlh 2 rLK 30 rLKh 1 Germanic-and-Romance total 142 5.68 b 31 b- 12 bf 51 bf- 18 bFLK 2 bk 1 bl 15 bl- 12 Excluded total 18 0.72 e 18 Eponymic total 1 0.04 n 1 Unknown total 18 0.72 u 18 Results 142 Most of the codes listed in Table 26 should be clear from the explanations in the methods section. Note, though, that the approach of recording changes between the Germanic and Romance language families sometimes results in relatively complex codes, such as gFLKG for the word machine. 328 Table 26 illustrates that English is a predominantly Romance language, even as far as the most common part of its vocabulary is concerned: of the 2,500 most frequent words in the British National Corpus, only about 38% can be considered to have a Germanic background 329 - as against more than 54% that are ultimately of Romance origin. To this can be added roughly 6% with a mixed Germanic-and-Romance background, as well as the three remaining minor categories each of which covers less than 1% of the list words. 330 The largest subcategory is that of the purely Germanic words, comprising 908 representatives and 36.32% of the analysis items, followed by the 740 words, i.e. 29.60%, belonging to the subcategory rf. The figures for the Romance subcategories French, Latin and Greek in Table 27 summarise all codes that involve the respective subcode F, L and K, regardless of whether it stands on its own or is combined within a bcode etc. Thus, bf-words are counted as French, and b-words are classified as unspecified Romance. Items belonging to several categories simultaneously, e.g. bFLK, are placed in each of the subcategories involved. Table 27 exemplifies quite clearly that the majority of the Romancebased words have entered the English language via French. The 98 Greek words come last, but still constitute 3.92% of all list words. Surprisingly, words of Latin and unspecified Romance origin have a very similar ratio of 300 to 350 words. However, if the fact that the French words are almost exclusively from Latin is taken into consideration (cf. Wunderli 1989: 33- 34), and if the unspecified French tokens with the codes bf, bf-, rf and rfh are therefore added to the words of Latin origin, the resulting 1,154 tokens constitute the largest category. As far as the proximate language is concerned, French is therefore the most important influence, but in terms of the original language, Latin occupies the most important place. 328 This particular code can be read in the following way: for technical reasons, the first letter indicates that it is a Germanic word - which can also be seen from the codefinal capital G. The intermediate F, L and K record that machine has entered English via French, that French has borrowed it from Latin and Latin from Greek respectively, but that according to the information in the SOED, it ultimately has a Germanic origin - namely the verb may. 329 Scheler (1977: 9) finds that a count of the tokens in a short English text yields a higher proportion of Germanic words than a count of its types. By basing itself on the most frequent words of the language, the present study deals with types, but with types that are based on the lexemes with the largest number of tokens. It thus represents an intermediate approach between purely typeand purely token-based approaches. 330 There are 18 words among the list items that are Germanic-and-Romance homonyms. These are always counted under the language family of the reading adhered to in the analyses. British National Corpus 143 Table 27: Romance subcategories of the BNC items C ATEGORY C ODE T OKENS French total 867 bf 51 bf- 18 bFLK 2 gFG 21 gFGh 1 gFLG 2 gFLKG 1 rf 740 rfh 3 rFK 1 rFLK 27 Latin total 342 bFLK 2 bl 15 bl- 12 gFLG 2 gFLKG 1 rFLK 27 rl 250 rlh 2 rLK 30 rLKh 1 Greek total 98 bFLK 2 bk 1 gFLKG 1 rFK 1 rFLK 27 rGK 2 rk 32 rk- 1 rLK 30 rLKh 1 Unspecified Romance total 319 b 31 b- 12 gRG 1 gRGh 1 r 270 r- 2 rh 2 Results 144 Table 28: Self-elaborated etymologies of the BNC items C ATEGORY C ODE T OKENS Germanic total 15 g- 15 Romance total 3 r- 2 rk- 1 Germanic-and-Romance total 42 b- 12 bf- 18 bl- 12 total 60 Table 28 proves that only a minority of the English analysis words - 60, i.e. 2.40% in total - have a self-elaborated etymology. Most of these words are of mixed Germanic and Romance origin. Germanic words constitute the second largest group, and Romance words come third. These figures are low enough to accept the analyses presented above as reasonably valid. There are 74 Romance, 32 Germanic, 2 homonymous and 1 Germanicand-Romance affix types. This proportion of roughly twice as many Romance as Germanic types is very stable, even if a distinction is made with respect to the number of times each affix occurs. Table 29: Etymological distribution and frequency of the English affix types: in percentages < 10 TOKENS 10+ TOKENS Germanic 29.89 27.27 Romance 66.67 72.72 Germanic-and-Romance 1.15 - Homonym 2.30 - If the actual tokens are counted, the order is 531 Romance, 278 Germanic, 5 homonymous and 1 Germanic-and-Romance affix tokens. Table 30 shows the distribution of affix types and tokens in relation to the words’ etymology. Table 30: Etymological distribution of the English affix types P REFIXES S UFFIXES L EXICAL G RAMMATICAL Types Tokens Types Tokens Types Tokens Germanic 6 20 19 151 7 107 Romance 25 77 49 454 - - Germanic-and- Romance - - 1 1 - - Homonym 1 1 1 4 - - British National Corpus 145 The Romance affixes yield more examples than the Germanic affixes both in terms of the number of types and tokens of prefixes and suffixes. 331 Table 31: Etymological origin and token-type-ratio of the English affixes Prefixes Lexical suffixes Gramm. suffixes Germanic 3.33 7.95 15.29 Romance 3.08 9.27 - Germanic-and-Romance - 1.00 - Homonym 1.00 4 - The association of the token-type-ratios with the respective etymologies suggests that the Germanic grammatical suffixes are used in the production of the largest number of tokens in the list. Furthermore, the lexical suffixes yield more representatives than the prefixes, regardless of their etymologies. While the Germanic prefixes are slightly more productive than the Romance prefixes, the Romance lexical suffixes are slightly more productive than Germanic suffixes, but in both cases, the differences are relatively small. If only the lexical affixes are considered, the average token-type-ratio for the Germanic and the Romance affixes is very similar, namely 6.84 and 7.18 respectively, with a slight predominance of the Romance affixes. Of the 805 words containing an affix, 522 are classified as Romance, 156 as Germanic, 123 as Germanic-and-Romance, and two as excluded and unknown respectively. Knowing that only ten words contain two affixes, and knowing that the 531 Romance affix tokens appear in 522 words, it is possible to conclude that the Romance affixes usually combine with a Romance base because otherwise, the respective words would have been marked as Germanic-and-Romance. 332 By contrast, there are 278 Germanic affix tokens, but only 156 Germanic words containing affixes. Therefore, 122 Germanic affixes must have combined with a Romance base or a Germanic base plus some Romance affix to form complex words of the etymological category Germanic-and-Romance - otherwise the items would have been labelled as purely Germanic. This indicates that Germanic and Romance affixes behave differently. When the etymological origin of the list items is correlated with their frequency, 95% of the 100 most frequent words turn out to have a Germanic origin. There are only four Romance words, namely just, people, very and use. Because is the only mixed word in this range - results that confirm Finkenstaedt and Wolff’s (1973: 119) finding that the high-frequency vocabulary is mainly Germanic. Considering that a majority of the sample as a 331 As the grammatical suffixes are all of Germanic origin, they escape comparison. 332 Five Romance affixes escape this general rule because they are contained in the four list words whose origin is either unknown or excluded from the analysis. Results 146 whole is predominantly Romance in origin, there must be a turning point at which the decrease in Germanic and the increase in Romance items lead to identical proportions of the Germanic and the Romance element. The crossing lines in Figure 4 indicate where this line can be drawn. This point corresponds to rank 1,009. Figure 4 also illustrates that the number of words with a mixed etymology experiences a constant if only weak rise in an indirectly proportional relation to the words’ frequency. Figure 4: Etymological origin and rank of the BNC items 4.1.5.2 Period of origin The majority of the 2,500 most frequent lexemes in the BNC are first attested in the Middle English Period, followed by an Old English and only then by a more recent origin. This is basically in line with Finkenstaedt and Wolff’s (1973: 110) observation that frequent words are mainly older words. However, in the sample analysed here, the Middle English origin clearly dominates over the Old English origin. Figure 5 illustrates the association of the period of origin with the rank of the English list words. As one may have expected, words with an Old English background constitute the overwhelming majority of the highestfrequency items. Apart from a few irregularities, their proportion then drops very quickly. While items that have entered the language in the Middle English period soon reach a relatively stable level, the proportion 0 500 1000 1500 2000 2500 0.0 0.2 0.4 0.6 0.8 1.0 Rank r g b British National Corpus 147 of later additions constantly rises, so that one may postulate a crossing of the curves for words attested in Old English and more recent items soon after the rank-2,500 borderline. This should result in a predominance of more recent words in the lower-frequency levels of the lexicon. Table 32: Period of origin of the BNC items C ATEGORY C ODE T OKENS % Old English total 703 28.12 o 696 27.84 oh 7 0.28 Middle English total 1289 51.56 m 1285 51.40 mh 4 0.16 Later total 503 20.12 l 487 19.48 l- 9 0.36 lh 7 0.28 No result total 5 0.20 x 5 0.20 Figure 5: Period of origin and rank of the BNC items 0 500 1000 1500 2000 2500 0.0 0.2 0.4 0.6 0.8 1.0 Rank m o l Results 148 4.1.5.3 Etymological origin and period of origin Most of the words that can be traced back to the Old English period are of Germanic origin, while most words originating in the Middle English period have a Romance background. The most important information that can be extracted from Table 33 is the fact that the Romance element is so intrinsically linked with the English language that the majority of the newer formations are purely Romance. 333 However, the number of mixed Germanic-and-Romance words is constantly growing as well - a fact that is not surprising in view of the fact that mixed items require sufficient material from both languages, which has only been the case since the late Middle English period. 334 Table 33 shows that the number of mixed items in the latest period has almost reached the level of the purely Germanic items, whose proportion steadily decreases. Table 33: Period of origin and etymological origin of the BNC items: in percentages Germ. Rom. G.-a.-R. Excl. Epon. Unkn. Tokens Old English 94.17 5.26 0.14 0.14 - 0.28 703 Middle English 17.53 75.33 5.74 0.62 - 0.78 1289 Later 12.72 73.36 11.93 1.79 0.20 - 503 No result 80.00 - 20.00 - - - 5 The growing number of excluded items with a more exotic provenience can be explained by the most recent history of the English language, in which contact with other nations has played an increasingly important role. 333 This observation, which is based on the sufficiently large number of 503 items, could be explained by the Romance affixes being particularly productive - however, Section 4.1.5.1 shows that this is not the case as Germanic and Romance affixes have very similar token-type ratios. This result can however be taken as the basis for a more convincing explanation, namely that it takes some time for foreign affixes to be naturalised. With the Romance words and the affixes contained therein entering the English language in the Middle English period, it is only in the most recent period of English that the Romance affixes could have reached their full productivity. In order to attain the same result as the Germanic affixes, they must consequently be the most productive affixes in recent times. And indeed, the consultation of the list with the BNC items containing affixes reveals that of the 250 words entering the English lexicon after the Middle English period, there are 2 excluded, 2 unknown, 22 Germanic, 56 Germanic-and-Romance and 168 Romance items. This is a ratio of 67.20% purely Romance affixations, which can be complemented by the few instances of Romance affixes added to a Germanic base. 334 With 20 Germanic-and-Romance affixations, the adverbial suffix -ly proves to be a strong integrator of foreign words. British National Corpus 149 4.1.6 Motivatability Table 34: The detailed English motivational codes C ATEGORY C ODE T OKENS % Full motivatability total 337 13.48 MO 241 9.64 MOg 1 0.04 MOl 16 0.64 MOS 33 1.32 MOW 41 1.64 MOWl 5 0.20 Partial motivatability total 1153 46.12 MP# 90 3.60 MP# R 14 0.56 MP# RT 32 1.28 MPA 93 3.72 MPA# 9 0.36 MPA# RT 8 0.32 MPAFl 1 0.04 MPAFS 3 0.12 MPAg 5 0.20 MPAl 8 0.32 MPAl# RT 1 0.04 MPAlRT 3 0.12 MPART 49 1.96 MPCRT 4 0.16 MPF 141 5.64 MPF# 68 2.72 MPF# R 9 0.36 MPF# RT 10 0.40 MPFg 3 0.12 MPFL 5 0.20 MPFl 5 0.20 MPFL# 7 0.28 MPFl# 6 0.24 MPFL# R 2 0.08 MPFLI 4 0.16 MPFLR 1 0.04 MPFLRT 1 0.04 MPFR 17 0.68 MPFRT 10 0.40 MPFS# 9 0.36 MPFS# R 1 0.04 MPFS# RT 2 0.08 MPFSL 1 0.04 MPFSl 1 0.04 Results 150 MPFSL# 1 0.04 MPFSLI 1 0.04 MPFSLI# 1 0.04 MPFSR 2 0.08 MPFSRT 3 0.12 MPFV 18 0.72 MPFV# 4 0.16 MPFV# RT 2 0.08 MPFW# 20 0.80 MPFW# R 1 0.04 MPFW# RT 3 0.12 MPFWCRT 2 0.08 MPFWg 3 0.12 MPFWL 2 0.08 MPFWl 1 0.04 MPFWL# 1 0.04 MPFWlR 1 0.04 MPFWR 4 0.16 MPFWRT 8 0.32 MPL 5 0.20 MPL# 2 0.08 MPl# 1 0.04 MPL# R 1 0.04 MPL# RT 1 0.04 MPLCR 2 0.08 MPLCRT 5 0.20 MPLI 1 0.04 MPLI# 3 0.12 MPLl# 1 0.04 MPLRT 1 0.04 MPR 11 0.44 MPRT 23 0.92 MPZ 317 12.68 MPZ# 28 1.12 MPZ# T? 1 0.04 MPZFS 9 0.36 MPZFS# 2 0.08 MPZFST? 2 0.08 MPZL 4 0.16 MPZT 4 0.16 MPZT? 28 1.12 No mot., but transparency total 110 4.40 UT 29 1.16 UT? 81 3.24 No motivatability total 900 36.00 UN 900 36.00 British National Corpus 151 Motivatability is the most central and important feature of the present study. Great care was taken to develop an analytical model flexible enough to adapt to the idiosyncratic requirements of each individual word. As a logical consequence, the analysis results in a large number of different codes combining the criteria set out in Section 3.3.2.2, some of them applying to more than 100 words, such as MP# , but others only to a single word, such as MPFWl. Table 34 lists all the motivational codes encountered in the analysis of the BNC items. These are then summarised in Table 35 and Table 36. Of the 84 different motivational codes, the most frequent one, UN for unmotivatable, occurs 900 times, but only three other codes, namely MO, MPF and MPZ, apply to more than a hundred items - which is quite a marked distinction. By contrast, 30 codes occur between three and ten times, 10 twice, and 22 codes are even item-specific with a frequency of one. This large number of subcategories, combined with the relatively low number of representatives, confirms the necessity for a very flexible classification system. This is particularly important in the treatment of partial motivatability, which comprises the largest number of subcategories, namely 75. However, such fine-grained distinctions cannot be upheld if more general statements are to be made. Consequently, only the four large motivational categories of full motivatability, partial motivatability, no motivatability and no motivatability, but transparency, are retained in the more sophisticated analyses. Table 35: Motivatability: simplified table C ATEGORY C ODE T OKENS % Full motivatability MO 337 13.48 M 59.60% Partial motivatability MP 1153 46.12 No motivatability, but transparency UT 110 4.40 U 40.40% No motivatability UN 900 36.00 Table 35 indicates that only 36.00% of the 2,500 list words are dissociated from the analytical point of view. In 110 cases, it is possible to relate the item in question to other English words on a purely formal basis. This corresponds to 4.40% of the total list words or to 10.89% of the unmotivatable words - a proportion that is considerably lower than may have initially been expected. At the other end of the scale, 9.64% of the words analysed are completely motivatable without any kind of restriction. To this can be added 3.84% cases with minor restrictions such as affixes that are not contained in the learner’s dictionaries or formal obstacles which are confined to the spoken or written variety only. Altogether, then, 13.48% of the 2,500 words are completely motivatable according to the definition laid out in Section 3.3.2.3. Results 152 Most importantly from a quantitative point of view, 46.12% of the items belong to the category of partial motivatability. If all words that are motivatable in some way or another are counted together, it becomes obvious that English is far from being an unmotivated language: with 59.60%, about three-fifths of the sample words are consociated in the analytical direction. The fact that partial motivatability implies some kind of difficulty directly leads to the question of what obstacles must usually be overcome in the analysis in order to arrive at meaningful constituents. This question is also of particular interest with regard to the differences between English and German. In order to make more general statements, the individual codes from Table 34 are lumped together to form larger categories. The figures in Table 36 are based on all codes with the exception of the category MO. 335 The bold percentages relate to the number of total list items, while the unmarked figures represent the category-internal proportion. 336 Table 36: Partial motivatability: simplified table C ATEGORY C ODE T OKENS % Formal differences total 398 15.92 F 290 72.86 FS 38 9.55 FW 46 11.56 FV 24 6.03 Semantic differences total # 341 13.64 Incomplete analysability total 353 14.12 R 66 15.94 RT 168 40.58 A 180 43.48 Marked constituents total 53 2.12 L 43 81.13 LI 10 18.87 Motivatability by a clipping or a formally related shorter synonym total C 13 0.52 Mot. by a grammatical polyseme total Z 395 15.80 Self-elaborated affixes total 42 1.68 g 8 19.05 l 34 80.95 335 Among the incompletely analysable items, the code under consideration is RT - and not simply T -, so that T-codes occurring in MPZ words are not included. These actually constitute an alternative analysis that is only needed if motivation by a grammatical polyseme is not accepted. 336 The category of incomplete analysability is special in that the sum of the items in the rows below - in this case, 414 - is not identical with the result indicated under total, which is 353. This is due to the fact that several items contain both the code A and the code RT. Consequently, instances of MPA# RT (8x), MPAl# RT (1x), MPAlRT (3x) and MPART (49x) must be counted once only for the general percentage. The categoryinternal percentage, however, relates to the sum of 414. British National Corpus 153 It becomes obvious from the above figures that there are three factors playing an almost equally important role as difficulty-inducing features in the analysis of motivatable English words, namely formal differences with 15.92%, closely followed by incomplete analysability with 14.12%, and semantics, which constitutes an obstacle in 13.64% of the instances. 337 It must therefore be stated that no single factor is responsible for the fact that most English words are only partially motivatable. Contrary to expectation, 338 formal differences only play a role in roughly one-sixth of all instances. Of these, more than 70% represent an obstacle in both the spoken and the written variety. Merely-written obstacles come second with roughly 12%, closely followed by merely-spoken difficulties. 339 Vowel gradation plays only a minor role with 6.03% of the formally difficult words. Incomplete analysability, the second largest of the simplified categories, unites three different subtypes: only about 16% of the affected words feature a free basis and a remainder that cannot be connected to any other lexical element. Far more frequently, the unmotivatable remainder formally resembles other words or affixes. With more than 40% of the incompletely analysable items containing a transparent remainder, this category should not be neglected in a description of the analysability of English words 340 - nor should the fact that even slightly more words contain one, in ten cases two, affixes, but no synchronically discernible base. 341 A possible explanation for this is that words of the oral type, i.e. of Romance origin with a base that has been darkened in the course of time, are a frequent phenomenon in English. This hypothesis is confirmed by the fact that as many as 147 of the 180 words only motivatable by an affix - i.e. 81.67% - are of Romance origin. Motivatability by a grammatical polyseme occurs in 15.80% of the list items and is regarded as a facilitating rather than as a difficulty-inducing 337 Motivatability by a grammatical polyseme does not constitute an obstacle in the sense that the other features discussed above do. It rather represents an additional motivating factor (see below). 338 For instance, the Simplified Spelling Society demands that the English spelling should be changed because it is so difficult (<http: / / www.spellingsociety.org/ aboutsss/ leaflets/ whyeng.php>, 14.10.2006). 339 This is also reflected in the completely motivatable items, where 46 words with written obstacles only are contrasted with 38 spoken-difficulty items. 340 Actually, there are more cases where transparent unmotivatable elements combine with truly motivating elements (168x) than words that are purely transparent but unmotivatable (110x). This is either an inherent feature of English - or it may be attributed to the fact that the analyst is more likely to have spotted a transparent element as a remainder in an analysis than to have recognised transparent pseudoconstituents in unmotivatable words. 341 Of course, the codes A and RT may also be combined - which actually occurs 61 times. Results 154 factor. It is interesting to note that motivatability by a grammatical polyseme plays such an important role - even though this type of motivatability was only admitted if no other motivating elements could be found. Similarly, words with constituents that are very doubtful from a semantic point of view were excluded from the beginning, so that semantic difficulties may actually be more important than is conveyed by the proportion of roughly 14% mentioned above. With a proportion of 2%, marked constituents are almost negligible. Most marked constituents are labelled by the dictionaries, and only 10 words are marked with the code LI for self-elaborated labelling. Selfelaborated affixes also occur in only 42 partially motivatable words. 342 4.1.6.1 Motivatability and frequency The most frequent partially motivatable item is not, which occurs on rank 21 and can be related to no. Government, on rank 133, is the first completely motivatable word in the BNC list. Figure 6 shows the association between the words’ rank and the four motivational categories. The most frequent items are predominantly unmotivatable, but the proportion of completely unmotivatable words experiences a relatively sharp drop in the first 500 items. Transparency sets in with the 81 st word, also. Its proportion rises very slightly and then declines again to remain relatively stable. By contrast, both partial and full motivatability show a very clear rise, the curve of partial motivatability being initially much steeper, but then, both graphs rise almost parallel to each other. If the four categories mentioned above are reduced to motivatability and unmotivatability, it becomes clear that the 1,130 most frequent lemmas behave differently from the rest of the English language. 343 Up to this point, most English words are unmotivatable; if words after rank 1,130 are considered as well, the English vocabulary is predominantly motivatable. The steepness of the motivatability curve in Figure 7 allows the conclusion that lower-frequency words should be more motivatable than high-frequency words; however, the rise is so gradual that only extremely lower-frequency words may yield marked differences to the present study’s results. 342 To these can be added 19 fully motivatable items with a self-elaborated lexical affix and one with an unrecorded grammatical affix, namely the -t in joint. 343 The lines of motivatable and unmotivatable items in Figure 7 cross between ranks 1,126 and 1,138, i.e. round about rank 1,130. If no precise crossing point can be indicated for some of the graphs in the present study, this is because several instances may occur in which the proportion of two variables is identical, with intermediate fluctuation, followed only then by real separation (pointed out by Antony Unwin). British National Corpus 155 Figure 6: Motivatability and rank of the BNC items Figure 7: Motivatability and rank of the BNC items: simplified version 0 500 1000 1500 2000 2500 0.0 0.2 0.4 0.6 0.8 1.0 Rank MP UN MO UT 0 500 1000 1500 2000 2500 0.0 0.2 0.4 0.6 0.8 1.0 Rank M U Results 156 4.1.6.2 Motivatability and part of speech Table 37: Part of speech and motivatability of the BNC items P ART OF SPEECH P O S CLASS MO MP UT UN noun lex 162 546 60 447 adjective lex 78 172 3 116 adverb lex 77 77 6 46 pronoun gr 9 21 - 11 verb lex 8 283 35 198 preposition gr 2 23 1 29 determiner gr 1 15 3 24 conjunction gr - 11 2 15 interjection gr - 3 - 3 modal verb gr - 2 - 10 infinitive marker gr - - - 1 Table 37 indicates how many items within the eleven parts of speech distinguished in Kilgarriff’s BNC frequency list belong to the different motivational categories. The figures are sorted according to the strength of the motivatability, with MO followed by MP and so on. The category with most representatives within each part of speech is highlighted in grey. Summing up the results, one can say that lexical words are predominantly partially motivatable, while grammatical words are mainly completely unmotivatable - with the exception of pronouns, which share the distributive pattern of the lexical words. This may be due to the fact that many pronouns are either composed, such as everything and someone, or derived, such as himself, or share formal and semantic similarities with other words, e.g. they and them, which are both third person singular pronouns. Adverbs and interjections also present a special case in that they simultaneously reach a maximum in two categories. While interjections are evenly split between partial motivatability and the complete lack of motivatability, adverbs show a very clear lexical pattern. Surprisingly, they even represent the part of speech with the highest proportion of full motivatability. However, this can be easily explained by the large number of derived items: 82 of 206 adverbs, i.e. 39.81%, are of the -ly type - and of these, 72, i.e. 87.80%, are fully and the remaining ten partially motivatable. 344 Table 38 indicates the summed percentages of full and partial motivatability in relation to part of speech. It illustrates that apart from the infinitive marker, which has only one representative and can therefore show 344 One may expect a certain difference between English and German here, as the German adverbs are often formally identical with their corresponding adjectives, so that German adverbs might rather be MPZ-motivatable. British National Corpus 157 no significant distribution, all other parts of speech show at least some degree of motivatability. While all lexical categories are predominantly motivatable, functional word classes are usually unmotivatable - except for pronouns, which come second in the motivational ranking. Table 38: Motivatable BNC items and part of speech Part of speech % motivatable adverb 74.76 pronoun 73.17 adjective 67.75 noun 58.27 verb 55.53 interjection 50.00 preposition 45.45 conjunction 39.29 determiner 37.21 modal verb 16.67 infinitive marker - Another aspect of interest in the relation between motivatability and part of speech addresses the question which parts of speech are most frequently involved in the motivatability by a grammatical polyseme. It must be borne in mind that only those instances are marked as MPZ in which no other form of motivation could be found and where the directionality is obvious - or at least in which a paraphrase yields a satisfactory result. Functional words may not be motivated by formally identical other functional words. Under these conditions, most of the MPZ-motivatable words are nouns and verbs. Table 39, in which the part of speech in small caps relates to the motivating word, shows that 166, i.e. 87.83%, of these nouns can be motivated by verbs, and as many as 156, i.e. 95.71%, of the verbs in question can be motivated by nouns. Adjectives are more important in the motivation of nouns than in that of verbs, but they predominate in the analysis of adverbs such as round and short. With nouns and verbs being the most common word classes, it is not surprising that most list words affected by motivatability by a grammatical polyseme should belong to those two parts of speech. However, if the parts of speech involving fewer than three tokens in Table 39 are disregarded for statistical reasons, motivatability by a grammatical polyseme is also most influential proportionally in nouns and verbs. MPZ-motivatable items make up about one-sixth of the nouns and one-third of all verbs. If their number is considered in relation to the number of motivatable items for each part of speech, figures rise to one-fourth for the nouns and even to as much as 56% for the verbs, which means that motivatability by a grammat- Results 158 ical polyseme is the single most influential factor in the motivation of the English verbs analysed in the present study. Table 39: Parts of speech involved in motivatability by a grammatical polyseme in the BNC items n v a adv pre det pro con mod int inf a 20 1 - - 2 - - - - - adv 2 - 11 - - - - - - - con 1 1 - - - - - - - - det 1 1 - - - - - - - - inf - - - - - - - - - - int - - - - - 1 - - - - mod - 1 - - - - - - - - n - 166 21 - - 2 - - - - pre 1 - - - - - - - - - pro - - - - - - - - - - v 156 - 6 - - 1 - - - - - 4.1.6.3 Motivatability and word length Motivatability requires the analysis of words into segments. Consequently, this should imply that longer words have a higher probability of being motivatable than shorter words. This is particularly reasonable in the case of one-letter words, which can hardly be segmented into smaller constituents, but it should also pertain to slightly longer words. Table 40 can be used for testing this hypothesis, as it associates the motivational categories with word length. Table 40: Word length and motivatability of the BNC items: in percentages MO MP U Tokens 1 - - 100.00 2 2 - 6.25 93.75 32 3 - 25.32 74.68 154 4 1.44 37.58 60.99 487 5 3.72 45.58 50.70 430 6 10.93 45.13 43.94 421 7 17.51 52.52 29.97 337 8 25.11 59.91 14.98 227 9 30.86 61.71 7.43 175 10 37.84 51.35 10.81 111 11 39.44 56.34 4.23 71 12 58.06 41.94 - 31 13 50.00 50.00 - 14 14 37.50 62.50 - 8 British National Corpus 159 As expected, the shortest words are usually unmotivatable, while the longest words are generally motivatable - twelve-, thirteenand fourteen-letter words even to 100%. This confirms the hypothesis formulated above. The dark cells in Table 40 further illustrate that there is a shift in the largest category going from opacity to partial motivatability and then to full motivatability with growing word length. Even if there are some figures in Table 40 that prevent perfect linearity - which may be partially due to the relatively low number of category members -, the general trend is clear: full motivatability grows and opacity decreases. This is supported by the joint consideration of the figures for the 11to 14-character words, with full motivatability growing to 45.16% and the proportion of unmotivatable items decreasing to 2.42%. 345 4.1.6.4 Motivatability and etymological origin Table 41: Motivatability and etymological origin of the BNC items: in percentages MO MP UT UN Tokens Germanic 9.60 39.87 2.61 47.91 958 Romance 9.98 53.34 6.09 30.59 1363 Germ.-and-Rom. 76.06 20.42 1.41 2.11 142 Excluded - 55.56 - 44.44 18 Unknown 5.56 27.78 - 66.67 18 Eponymic - - - 100.00 1 The majority of the analysis items being of Romance origin, words with this etymological background constitute the largest absolute number in all motivational categories. As for the proportional distribution of the motivational categories, full motivatability is highest among the Germanic-and- Romance items. Faced with this evidence, one is prompted to ask whether words of mixed origin are better integrated into the “society of words” than others. However, this result can be explained by the design of the analysis itself: words that are classified as Germanic-and-Romance must have been composed from different parts at some time in the past at least, so that the probability that they are still motivatable to some degree is inherently higher than the average. 346 The largest number of Germanic words occurs in the category of the completely unmotivatable items, while the majority of the Romance words are partially motivatable. The reverse pattern of Romance and Germanic items is highly significant in the categories of partially motivatable, com- 345 The partially motivatable words’ proportion of 52.42% allows no obvious conclusion. 346 In this light, it is even more surprising that there should be words that are of mixed ancestry but not synchronically motivatable, namely line, perhaps, acknowledge, means and the acronym-turned-word okay. Results 160 pletely unmotivatable, and unmotivatable but transparent items (p < 0.00001) according to a Chi square test. 347 Surprisingly, the excluded words are predominantly motivatable with a ratio of ten partially motivatable to eight completely unmotivatable items. However, the number of affected items is too small to allow any relevant conclusions for words of exotic provenience. Table 42: Simplified motivatability and etymological origin of the BNC items: in percentages M U Tokens Germanic 49.47 50.52 958 Romance 63.32 36.68 1363 Germanic-and-Romance 96.48 3.52 142 Excluded 55.56 44.44 18 Unknown 33.34 66.67 18 Eponymic - 100.00 1 If the results from Table 41 are summarised in terms of a binary distinction between motivatability and the lack thereof, as shown in Table 42, the Germanic words are more or less evenly split between the two categories, but with a slight majority of unmotivatable items. The Romance words, by contrast, are far more motivatable: with 63.32%, almost two-thirds are analytically linked to other lexical elements. A look at Table 41 reveals that the basic difference in the motivatability of Germanic and Romance words lies in the field of partial motivatability, whereas figures for full motivatability are almost identical. Furthermore, a higher percentage of Romance than of Germanic words is unmotivatable but transparent. One observation in the calculation of the results is that English words with formal difficulties seem to be mainly of Romance origin. The association of the 398 partially motivatable items with the code F for formal difficulties confirms this hypothesis: with one exotic, seven Germanic-and- Romance, 107 Germanic and 283 Romance items, the proportion of the Romance items among the formally difficult words is 71.10% and thus clearly higher than the average of 54.52%. This leads to yet another question, namely whether the three Romance subgroups behave in identical ways. 347 The author is aware of the fact that the makeup of the present study makes the application of tests of significance problematic. Nonetheless, some tests were carried out in order to offer guidelines for the interpretation of the data. British National Corpus 161 Table 43: The motivatability of BNC items with a Greek origin Code Tokens % Full motivatability MO 5 5.10 M 40.81% Partial motivatability MP 35 35.71 No mot., but transparency UT 9 9.18 U 59.18% No motivatability UN 49 50.00 In total, 98 items from the list have a Greek origin. If one compares the motivatability of these Greek-based words with that of the English analysis items in general, 348 the words with Greek roots show a tendency to be far less motivatable than the average at 40.81% as against 59.60%. This difference is observable both in the domains of full and partial motivatability. By contrast, the percentage of transparent unmotivatable items is particularly high among the words with Greek as their original language. Table 44: The motivatability of BNC items with a Greek origin vs. average motivatability: in percentages MO MP UT UN Greek-based words 5.10 35.71 9.18 50.00 Analysis words in general 13.48 46.12 4.40 36.00 Of the 867 French words, only 25 ultimately go back to a Germanic origin. The other 97.12% are of Romance origin. 349 Table 45: The motivatability of BNC items with a French origin Code Tokens % Full motivatability MO 116 13.38 M 61.59% Partial motivatability MP 418 48.21 No mot., but transparency UT 55 6.34 U 38.40% No motivatability UN 278 32.06 A comparison of the French-based words with the motivatability of the analysis items in general displays more similarities than the comparison of the Greek-based words with the average. The numerical differences are smaller, and there is no obvious tendency, as each is a little larger than the 348 Of course, this comparison can only yield approximative results, as the Greek words are included within the general average. A comparison with the motivatability of all but the Greek words would yield even more extreme results, even if the low number of words with a Greek origin may lead to minor changes only. The same kind of reservation applies to all other comparisons of subgroups with the general average in the present study. 349 The extremely few Celtic words are marked as excluded and do not enter into this calculation anyway. Results 162 other in one motivational and one non-motivational category (cf. Table 46). These results indicate that the words with a French origin or intermediate step are better integrated into the English language than the Greek-based ones. 350 With 61.59%, the French items are even more motivatable than the general average. As there are 171 instances of motivatability by a grammatical polyseme among the French words, their 19.72% MPZ ratio is slightly higher than the overall 15.80%. Table 46: The motivatability of BNC items with French origin vs. average motivatability: in percentages MO MP UT UN French-based words 13.38 48.21 6.34 32.06 Analysis words in general 13.48 46.12 4.40 36.00 In principle, it would be possible to classify all Romance or Germanic-and- Romance items involving either the codes r, f or b as Latin. However, it is more interesting to look at only those words marked with l that are either Romance or Germanic-and-Romance, but which involve neither French nor Greek diachronic steps and can therefore be regarded as purely Latin words as far as their Romance ancestry is concerned. Table 47: The motivatability of BNC items with Latin origin Code Tokens % Full motivatability MO 40 14.34 M 68.46% Partial motivatability MP 151 54.12 No mot., but transparency UT 7 2.51 U 31.54% No motivatability UN 81 29.03 Of the 279 Latin words, 68.46%, i.e. slightly more than the general average of 59.60%, are motivatable. The figures for both full and partial motivatability are slightly higher than in the whole sample. Table 48: The motivatability of BNC items with Latin origin vs. average motivatability: in percentages MO MP UT UN Latin-based words 14.34 54.12 2.51 29.03 Analysis words in general 13.48 46.12 4.40 36.00 Summing up, the three subcategories that are subsumed under the label Romance behave differently with respect to motivatability. The Greek 350 Note, though, that a French and Greek origin need not be exclusive, e.g. in the items with the code rFLK. Furthermore, the larger number of French items exerts a stronger influence on the general average than the lower number of Greek words. British National Corpus 163 words - which are only Romance in the wider definition used here - stand out from the other categories. While French and Latin figures are more or less close to the general average, the motivatability of the Greek items experiences a real drop in relation to the other figures - particularly so as the words with French as one of their stages or Latin as their original basis are actually even more motivatable than the average in the domain of partial motivatability. This is only imperfectly counterbalanced by a particularly high proportion of transparent words with a Greek origin. Table 49: The motivatability of BNC items with Greek, French and Latin origin vs. average motivatability: in percentages MO MP UT UN Analysis words in general 13.48 46.12 4.40 36.00 Greek-based words 5.10 35.71 9.18 50.00 French-based words 13.38 48.21 6.34 32.06 Latin words 14.34 54.12 2.51 29.03 4.1.6.5 Motivatability and period of origin Table 50: Motivatability and period of origin of the BNC items: in percentages MO MP UT UN Tokens Old English 4.84 32.86 2.70 59.60 703 Middle English 13.58 52.44 5.28 28.70 1289 Later 24.65 48.91 4.57 21.87 503 No result 80.00 - - 20.00 5 The majority of the analysis words with an Old English origin are not motivatable - in contrast to the items from all other periods. From a diachronic point of view, then, the overall figures for motivatability have steadily grown, from 37.70% in words attested in Old English over 66.02% in the Middle English additions to 73.56% for the latest items. Furthermore, the proportion of fully motivatable items has risen steadily, from a mere 4.84% among the oldest words to 24.65% in the most recent items. This development is paralleled by a steady decrease in the proportion of unmotivatable words. Thus, Leisi’s claim that the English language has lost its wordformation power (cf. Leisi 1999: 52 and Leisi 1961: 261) is not tenable, at least not for the most frequent items that were analysed in the present study. These results indicate that the English language is not suffering from a loss of motivation, but rather that it is becoming increasingly motivated. This finding can be explained by the fact that while the language is in need of new words all the time, creation ex nihilo is a very rare phenomenon in word formation, so that the available resources are used and combined in Results 164 order to yield new, usually motivated, words. While the earlier periods were busy providing the basic word stock, the later periods can draw on these large sources to form new lexical items. 4.1.7 Expandability Table 51: Expandability of the BNC items C ATEGORY C ODE T OKENS % Full expandability total 2199 87.96 B 1291 E 908 Partial expandability total 273 10.92 B+F 12 B+FL 1 B+FS 14 B+FSL 2 B+FW 36 B+FWL 4 B+L 88 B+l 7 B+LI 7 B+lI 1 B+LI# 1 B+Z 5 E+# 2 E+C 3 E+F 13 E+FC 1 E+Fl 1 E+FS 10 E+FSL# 1 E+FW 15 E+FWL 1 E+L 41 E+l 4 E+L# 1 E+LI 2 No expandability total 28 1.12 M 22 N 6 With as few as 1.12% unexpandable items, the English words analysed in the present study are almost all integrated into word families by their synthetic quality. It also becomes clear from Table 51 that the overwhelming majority of expansions are full expansions: of the 2,472 words that are expandable, 2,199, i.e. 87.96% of all items, are not restricted in any way. De- British National Corpus 165 pending on the approach one wants to take, even more words can be added to this category: for instance, 37 words display a lower-case b in the data base’s BNC-frequency column, which indicates that the item in question is only expandable into a word marked as British English in the dictionaries, such as phone-in or housing estate. However, this may seem an unnecessary restriction that could be neglected, so that the proportion of full expandability would be even higher. 351 Altogether, the impressive proportion of 98.88% of the 2,500 BNC words is expandable. Table 52: Expandability of the BNC items: simplified table C ATEGORY T OKENS % Full expandability 2199 87.96 98.88% Partial expandability 273 10.92 No expandability 28 1.12 1.12% Table 53 offers a summary of the kinds of obstacles occurring in the expansions. The bold percentages relate to the number of total list items, while the unmarked figures represent the category-internal proportion. 352 Table 53: Partial expandability of the BNC items: simplified table C ATEGORY C ODE T OKENS % Formal differences total 111 4.44 F 28 25.23 FS 27 24.32 FW 56 50.45 Semantic differences total # 5 0.20 Marked expansions total 162 6.48 l 12 7.41 L 139 85.80 lI 1 0.62 LI 10 6.17 Expandability by a formally related longer synonym total C 4 0.16 Expandability by a grammatical polyseme total Z 5 0.20 Contrary to expectations, the most important kind of obstacle does not apply to formal aspects, but rather consists of marking: while only an ex- 351 Cf. the alternative results in Section 5.2. 352 Actually, the count for the L-labelled words should be two tokens lower, as until-thensecret as an expansion for until as conjunction and preposition is a BNC word that ought to be labelled LI if anything. However, this has no substantial effect on the results. Results 166 pansion outside the core vocabulary could be found for 6.48% of all items, only 4.44% of the 2,500 list words take an expansion exhibiting a formal difference. Within the category of formal obstacles, changes merely in the written form are by far the most frequent type with a proportion of roughly 50%, followed by changes in both spelling and writing and then by merely spoken differences, the last two types sharing the remaining half of instances evenly among each other. Semantic differences and expansions by longer synonyms or grammatical polysemes all concern less than 0.25% of the words and can therefore be disregarded. Yet the 162 marked expansions are not all equally strongly marked. Thus, 12 items bear the same diasystematic label as their original base, e.g. the formal expansion purchaser and the formally marked list word purchase. Another important issue to be considered is the distribution of types of markedness. The items labelled with the codes l or L can be assigned to the categories in Table 54. Note that the category with the largest number of labels is not homogeneous, but rather a cover category comprising all domain labels such as medicine or sport. While computing is the largest uniform subcategory in this group, business, finance and economics joined together can be considered to form the most important focus of interest of the labelled expansions. There are seven words which are assigned two labels by the dictionaries, 353 namely doless dial. & colloq. his-self dial. & colloq. me-too BE, informal okey-doke BE, informal pub crawl BE, informal therefrom formal or law transference technical or formal Among the labels referring to stylistic, regional or frequency-related information, the distinction between markedly formal and markedly informal 354 items is the one most frequently made. The only two dialectally-marked expansions are doless and his-self. 355 353 That us-ness receives the additional label jocular apart from colloquial is not of interest here. 354 As mentioned before, informal and colloquial are considered synonymous in this kind of context. 355 On the one hand, the fact that dialectally marked words are unlikely to be known by foreign speakers of English makes them poor candidates for expansion, but on the other hand, they form part of the synchronic word-stock of English. In addition, both words are marked as both dialectal and colloquial. Assuming that dialectal words need not be labelled as colloquial because this is understood anyway - with words such as whereuntil being merely marked as dialectal in the dictionaries -, one can de- British National Corpus 167 Table 54: Diasystematic labels of the BNC items Label Tokens formal 39 BE 37 informal/ colloquial 21 technical 356 8 nonce word/ nonce derivative 357 5 dialectal 2 literary 1 rare 1 domains 42 computing 7 business 5 finance 4 law 3 mathematics 3 philosophy 3 economics 2 grammar 2 medicine/ medical 2 physics 2 architecture 1 biology 1 linguistics 1 microscopy 1 physiology 1 politics 1 psychology 1 sport 1 statistics 1 duce that doless and his-self are not only dialectal words, but also general English words that are situated on the colloquial level. 356 Technical words could also be included under the domains, but they are even more aptly classified as a category of their own, in analogy to the label literary, which makes a stylistic statement. 357 It is most surprising to find nonce words and nonce derivatives in dictionaries, as it is typically assumed that the reference works list only the established part of the vocabulary. However, closer inspection of the expansions anotherness, consequenceless, do-it-himself, minehood and otherwiseness reveals that they are all from the Oxford English Dictionary. Full-text search for nonce in both OALD and SOED yields no nonce lemmas, so that the largest dictionary seems to be the only exception to the general rule. Results 168 4.1.7.1 Expandability and frequency It would have been desirable to show the association of expandability and frequency graphically, in analogy to the treatment of the other variables. However, the degree of expandability is so high all over the 2,500-word sample and the variation is so minimal that it is not possible to illustrate the changes in a satisfying and meaningful way. However, the minimal variation allows the conclusion that expandability should be observable in items considerably further down the frequency range as well. 4.1.7.2 Expandability and source size Table 55: Expandability of the BNC items according to their sources Tokens % OALD 2071 82.84 SOED 249 9.96 OED 58 2.32 BNC 94 3.76 - 28 1.12 Table 56: Frequency of the BNC expansions BNC freq. Number of BNC exp. with this freq. 1 58 2 7 3 5 4 2 5 2 6 6 7 2 9 1 10 4 11 1 12 1 15 2 36 1 49 1 58 1 It is no exaggeration to say that for the overwhelming majority of the words, an expansion can be found within the learner’s dictionary OALD. The decreasing proportion of words from the SOED and the OED is also in line with the expectations. By contrast, the slight increase in expansions from the BNC was not foreseen. As this may be due to the large number of British National Corpus 169 ad-hoc formations in the corpus, the need to investigate the frequency of the BNC expansions arises. 358 With 58 tokens, 61.70% of the BNC expansions occur only once. 359 These results confirm the hypothesis that the BNC expansions are predominantly nonce formations that do not belong into the core of the vocabulary. They can be considered in a purely structural analysis, but from a cognitive point of view, their influence can be assumed to be only minor. 360 What is interesting to note is that there are three words with a BNC frequency of more than 30 - widely-used, ex-employee and second-round - which are not included in any of the dictionaries used here. However, their number is still so small that this cannot be taken as an indicator that the dictionaries are missing a relevant number of words - on the contrary, the BNC searches confirm the quality of the existing dictionaries’ macrostructure. 361 With respect to word class, the BNC-expanded items show notable differences to the list words in general. Thus, the lexical categories of nouns, verbs and particularly adjectives are clearly underrepresented, while the functional items such as conjunctions, prepositions and pronouns are usually overrepresented. In this respect, adverbs behave more like the grammatical words and occur more frequently than on average. Table 57: Part of speech of the BNC-expanded items vs. average Part of speech Tokens BNC-expanded % BNC-expanded % average noun 18 19.15 48.60 verb 2 2.13 20.96 adjective 1 1.06 14.76 adverb 49 52.13 8.24 preposition 7 7.45 2.20 determiner 2 2.13 1.72 pronoun 10 10.64 1.64 conjunction 4 4.26 1.12 modal verb 1 1.06 0.48 interjection - - 0.24 inf. marker - - 0.04 358 The word whatever-it-was has been correctly coded here as having a frequency of four, but not elsewhere. However, this has no substantial effect on the results. 359 Of the items concerned, there are 18 fully motivatable, 30 partially motivatable, 5 completely unmotivatable and 5 unmotivatable but transparent words. 360 Nevertheless, a low frequency for the item considered as the best possible expansion may not really allow a conclusion as to the degree of quantitative consociation because it may be the case that particular words are integrated into the vocabulary by large numbers of hapax expansions. 361 Heid et al. (2004) and Evert et al. (2004) describe computer tools that allow corpusbased dictionary updating by automatically suggesting frequency-based inclusion and removal candidates. Results 170 Be that as it may, it is interesting to note how many functional words are actually expandable. Table 58 gives examples for expansions for the different parts of speech: Table 58: Expansions for English functional words Part of speech Item Expansion preposition after afternoon determiner the first-past-the-post pronoun any anything conjunction and drag-and-drop modal verb may maybe interjection hello golden hello infinitive marker to mother-to-be 4.1.7.3 Expandability and word length As expansions are by definition longer than their bases, and as word length does not seem to grow infinitely, at least in practice, one prediction is that words should be less likely to be expandable in proportion to their length. This prediction can be tested with the aid of Table 59. Table 59: Word length and expandability of the BNC items: in percentages Full exp. Partial exp. No exp. Tokens 1 100.00 - - 2 2 87.50 12.50 - 32 3 90.91 7.79 1.30 154 4 91.99 7.80 0.21 487 5 92.33 7.44 0.23 430 6 85.27 13.54 1.19 421 7 86.94 11.28 1.78 337 8 82.38 15.86 1.76 227 9 85.71 12.57 1.71 175 10 82.88 16.22 0.90 111 11 78.87 18.31 2.82 71 12 93.55 - 6.45 31 13 85.71 7.14 7.14 14 14 75.00 25.00 - 8 The figures oscillate between 75% and 100% for full expandability and roughly 7% to 25% for partial expandability between words of particular lengths. In the no-expandability column, figures rise relatively constantly with growing length, but the ratio unexpectedly drops to 0% in the longest words. However, this drop may be explained as coincidental in a sample that is too small to be representative. On the whole, it seems that fewer words tend to be expandable with growing word length, but as there are British National Corpus 171 considerable jumps in the data, number of letters and expandability cannot be related in any purely regular way. 362 What is certain, though, is that the majority of the words in the sample are not only expandable, but even fully so - and that even in the thirteenand fourteen-letter words. Nonetheless, one may expect a turning point in even longer words, and the phenomenon observed here may simply be attributed to the fact that the frequent words in a language are more likely to be expandable than others. 4.1.7.4 Expandability and etymological origin Even if the results in Table 51 show quite clearly that most of the English analysis items are expandable, there may be observable differences based on the respective words’ etymology. Table 60 seeks to explore these differences, but it conveys the impression that there are actually more similarities than differences. Table 60: Expandability and etymological origin of the BNC items: in percentages Full exp. Partial exp. No exp. Tokens Germanic 89.77 8.25 1.98 958 Romance 86.21 13.65 0.15 1363 Germ.-and-Rom. 92.25 2.82 4.93 142 Excluded 100.00 - - 18 Unknown 83.33 16.67 - 18 Eponymic - 100.00 - 1 For instance, the motivational figures for full expandability of the Germanic and the Romance items are so close to each other that one may not want to stress the difference between the respective 89.77% and 86.21% unduly. Note, however, that the tendency for the Germanic items to be more fully expandable than the Romance words is also reflected by the larger proportion of Romance items that are partially expandable. Here, the difference of 5.40% is sufficiently large to make the point. By contrast, Germanic words also represent the largest proportion of unexpandable items with 19 representatives, followed by seven Germanic-and-Romance and merely two purely Romance words. This is particularly striking if it is 362 This is in line with the contradictory statements made in Bauer (1983: 66), who says that in “the Germanic languages, at least, there is no such thing as the longest compound”. At least in theory, one should then always be able to find an expansion for any word. However, he also recognises that “limitations on short-term memory may affect the length of compounds in actual use” (Bauer 1983: 67), so that a word such as great-great-great-great-great-great-great-great-great-great-great-great-great-grandfather is theoretically possible, but not used - which is why some very long words do not have an expansion. Results 172 taken into consideration that the ratio of Germanic to Romance items is about 7: 10. Taking everything into account, then, the Romance words are slightly more expandable than the Germanic words with 99.86% as against 98.02% - however, this translates as a difference of only 1.83%. The excluded items allow another interesting observation to be made. Finkenstaedt and Wolff (1973: 161) assume that what they call exotic words are more isolated than other English words, but the results of the present study convey a very different picture: even if 18 such items do not constitute a representative amount, one can state that their 100%-ratio of full expandability lies slightly above the average of 98.88%. 363 4.1.7.5 Expandability and period of origin Table 61: Expandability and period of origin of the BNC items: in percentages Full exp. Partial exp. No exp. Tokens Old English 92.32 7.25 0.43 703 Middle English 86.97 11.79 1.24 1289 Later 84.69 13.72 1.59 503 No result 60.00 20.00 20.00 5 There is a noticeable connection between expandability and period of origin, with the oldest words being not only more generally expandable, but also fully so. While full expandability decreases with recency, partial expandability figures rise for the more recent items. However, this growth in partial expandability cannot completely make up for the loss of full expandability, so that more of the words in the category Later are unexpandable. This can be explained by the fact that words need a certain degree of integration into a language before they can be expanded. Consequently, the oldest words have an advantage over the more recent formations or loans. However, the ratio of 1.59% unexpandable items in the category of the most recent words is still relatively low. 363 However, the importance of this result must not be overestimated because of the strong impact of individual items’ behaviour in small samples. Furthermore, the most frequent items may behave differently from exotic words in general, as the foreign origin of words such as tea and coffee is secondary to their extreme frequency in the language. Actually, many language users may be unaware of the foreign origin of such very frequent words. British National Corpus 173 4.1.8 Consociation Consociation consists of the combined values for motivatability and expandability and will therefore be dealt with rather briefly. The question whether the English language is dissociated is clearly negated by the evidence given in Table 62: of the 2,500 words that were analysed in the present study, only 6 are neither motivatable nor expandable. This translates as 0.24%. Of the 2,494 consociated words, only 22, i.e. less than 1% of all items, are integrated into word families merely through motivatability. 364 More importantly from a quantitative point of view, 1,004 items, i.e. 40.12%, can only be integrated into word families by means of expansion. 365 However, as many as 1,468 items - constituting the largest part with a proportion of 58.76% - are both motivatable and expandable. 366 Of these, the majority are subject to partial motivatability and full expandability. This raises the question of how many items are fully consociated and therefore completely unrestricted by obstacles in the domains of form, semantics etc. Table 62: Degrees of consociation of the BNC items M U MO MP UT UN Full consociation B 299 991 / / Partial consociation B+ 31 147 / / M 7 15 / / E / / 93 816 E+ / / 14 81 No consociation N / / 3 3 It is not surprising in the light of the general tendency set out above that the merely motivatable words display partial rather than full motivatability and that the unmotivatable items tend to be fully rather than partially expandable. 367 Table 62 also illustrates that only 299 of the English items can be considered completely consociated if the strictest standards are applied. 368 2,195 more words are partially consociated, 369 and only six items cannot be related to a word family at all. Put differently, this means that only 11.96% 364 This corresponds to the summed figures in the row with the code M. 365 This corresponds to the summed figures in the rows E and E+. 366 This corresponds to the summed figures in the rows B and B+. 367 The verb matter has been correctly coded here as E/ UN (instead of the impossible combination B/ UN), but not elsewhere. However, this has no substantial effect on the results. 368 This corresponds to the darkest cell in Table 62. 369 This corresponds to the medium dark cells in Table 62. Results 174 of the words analysed are so strongly consociated that it would be hard to disagree on their status, but that it is possible to detect a fairly satisfactory kind of family relationship in an additional 87.80%. 4.1.8.1 Consociation and frequency The problem observed in the association of expandability and frequency poses itself anew in the discussion of consociation and frequency. The extremely low proportion of items that are only motivatable or completely dissociated prevents a satisfactory graphical illustration - which is why the categories M and N are not included in Figure 8. However, the remaining graphs illustrate very clearly that with decreasing frequency, there is a rise in the proportion of words that are consociated both analytically and synthetically, whereas the proportion of only expandable items decreases with rank. The two lines cross between ranks 1,151 and 1,173, i.e. round about rank 1,160. 370 Figure 8: Consociation and rank of the BNC items 370 Based on the low number of only motivatable items in the highest frequency ranges, one may conclude that the point where the proportion of M-consociated words crosses the graphs for the B-consociated and the E-consociated words must lie very far down in the frequency ranges. 0 500 1000 1500 2000 2500 0.0 0.2 0.4 0.6 0.8 1.0 Rank B E British National Corpus 175 4.1.8.2 Consociation and part of speech Table 63: Consociation and part of speech of the BNC items: in percentages PoS lex/ gr B M E N Tokens adverb lex 69.90 4.85 24.76 0.49 206 pronoun gr 68.29 4.88 26.83 - 41 adjective lex 67.21 0.54 31.98 0.27 369 noun lex 58.02 0.25 41.65 0.08 1215 verb lex 55.73 - 44.27 - 524 interjection gr 50.00 - 33.33 16.67 6 preposition gr 45.45 - 52.73 1.82 55 determiner gr 34.88 2.33 62.79 - 43 conjunction gr 25.00 14.29 57.14 3.57 28 modal verb gr 16.67 - 83.33 - 12 inf. marker gr - - 100.00 - 1 All lexical word categories show the same distributional pattern with a clear majority of items consociated in both directions, followed by merely expandable items in the second place. By contrast, most functional word categories have a tendency for the merely expandable items to form the largest group, which can be interpreted as indicative that grammatical words are less likely to be motivatable, and that expandability serves as the more important consociating factor here. The only exceptions are interjections, where the one item constituting the difference can be dismissed as unimportant, and pronouns, of which 28 are integrated both analytically and synthetically. 371 The conclusion that can be drawn from this is that in a model that refuses to see expandability as a consociating factor, most of the grammatical words would be dissociated. 4.1.8.3 Consociation and word length The results of the association of consociation and word length only partly correspond to prior expectations. On the one hand, it was to be expected that the shortest words, which are overwhelmingly integrated into word families according to the data in Table 64, would be integrated on the basis of their high degree of expandability as there are only few even shorter constituents. This is confirmed by the dark cells in the first five rows of the table. With the exception of a minor jump in ten-letter-words, consociation by means of expandability only constantly decreases with growing word 371 In Section 4.1.6.2, an attempt has been made to explain the high motivatability of pronouns. As for the synthetic direction, it has to be noted that a relatively large number - 10 out of 41, i.e. almost one quarter - of the pronouns are only expandable in the BNC, a ratio that is very much higher than the overall 3.76%. Furthermore, many of these expansions are phrase compounds such as someone-or-other or whereare-they-now, on the word status of which there is considerable disagreement. Results 176 length. By contrast, the proportion of words consociated by motivatability only seems to be on the increase. The proportion of simultaneously motivatable and expandable items shows a steady increase up to the nine-letter words and then fluctuates slightly below and above 90%. As the low number of representatives of the longest words allows no relevant conclusions, 11to 14-letter words were again considered jointly. The outcome of 93.55% Bs, 4.03% Ms and 2.42% Es confirms the trends described above. Table 64: Consociation and word length of the BNC items: in percentages Word length B M E N Tokens 1 - - 100.00 - 2 2 6.25 - 93.75 - 32 3 24.68 0.65 74.02 0.65 154 4 39.01 - 60.78 0.21 487 5 49.07 0.23 50.70 - 430 6 55.34 0.95 43.47 0.24 421 7 69.14 0.89 29.08 0.89 337 8 83.26 1.76 14.98 - 227 9 90.86 1.71 7.43 - 175 10 88.29 0.90 10.81 - 111 11 92.96 2.82 4.22 - 71 12 93.55 6.45 - - 31 13 92.86 7.14 - - 14 14 100.00 - - - 8 Another conclusion that can be drawn from the figures in Table 64 is that no word of eight letters or more is dissociated. One might have expected dual consociation to rise to a certain point and then decrease due to less strong expandability of long words, caused by the existence of fewer even longer words. The column with the figures for words that are merely motivatable but not expandable provides some evidence for this hypothesis, as it is possible to detect an increase in the proportion of longer words. However, the turning point for the B-items can be expected to come far beyond a length of 14 letters. 4.1.8.4 Consociation and etymological origin Table 65: Consociation and etymological origin of the BNC items B M E N Germanic 460 14 479 5 Romance 863 1 498 1 Germanic-and-Romance 130 7 5 - Excluded 10 - 8 - Unknown 6 - 12 - Eponymic - - 1 - British National Corpus 177 If the Germanic and the Romance words are compared with respect to their degree of consociation, the category of words that are both motivatable and expandable is clearly dominated by the Romance element, with 863 as against 460 tokens. This relation also obtains if relative figures are considered, with 63.32% Romance as against 48.02% Germanic B-words. The Germanic and the Romance items attain roughly similar absolute results in the category of merely expandable words, with the Romance words slightly in the lead. However, if the relative proportions are considered, the 50% proportion of Germanic words lies clearly above the 36.54% of Romance items. The Romance words thus display a higher degree of consociation than the Germanic ones. As regards the minor categories, the degree to which the Germanic-and- Romance words are both motivatable and expandable is particularly striking. Moreover, in the category of the only-motivatable words they are also clearly above average, whereas in the category of the only-expandable items they are even more obviously below average. 372 The excluded exotic items are by no means more, but rather less dissociated than the average. Over 44% are merely expandable, but more than half of them are both motivatable and expandable. 373 4.1.8.5 Consociation and period of origin Table 66: Consociation and period of origin of the BNC items: in percentages B M E N Tokens Old English 37.41 0.28 62.16 0.14 703 Middle English 65.09 0.93 33.67 0.31 1289 Later 72.37 1.39 26.04 0.20 503 No Result 60.00 20.00 20.00 - 5 The dark cells in Table 66 indicate which form of consociation prevails in each of the periods distinguished in the present study. Thus, Old English words are mainly consociated by the weakest form of consociation, which involves only expandability. By contrast, Middle English and more recent items are particularly well consociated by being predominantly both motivatable and expandable with 65.09% and 72.37% respectively. With the due caution that a diachronic analysis consisting of only three periods requires, this data can be taken to suggest that there is a direct relation between the 372 It has been explained before that this is due to the special character of the mixed category as one whose representatives are per se most likely to be motivatable (cf. Section 4.1.6.4). 373 Again, this may be explained by the assumption that high-frequency exotic words cannot be compared to loans from the lower frequency ranges in this respect, and it must not be forgotten that these results are based on very few items. Results 178 degree of consociation as measured here and the period of origin, with recency resulting in the highest level of consociation - at least in the highfrequency ranges. 374 This is also in line with the fact that mere motivatability as the second strongest form of consociation rises in the more recent periods. 4.1.9 Dissociation Of the 2,500 words from the BNC under consideration in the present study, only the six listed in Table 67 can be said to be dissociated in the sense that they are neither motivatable nor expandable. Yet even those items, which do not belong to any synchronic morphosemantic word family, are not as antisocial as Leisi’s definition of dissociation may suggest: all of them can be related to existing words in some way or another. For instance, the interjection aye is related to the adverb yes, but in view of the item’s shortness, the formal differences, particularly in the pronunciation, are considered too large to bring about motivatability in the strictest sense. Three of the other five items, namely behalf, however and whereas, can be decomposed into transparent pseudo-constituents, and the adjective very - as in the very beginning - is formally identical with the adverb very. 375 There are even both formal and semantic similarities between beneath, the last of the six words, and underneath, so that this item must actually be taken as an instance of too strict classification. 376 Table 67: The dissociated English items 377 L EMMA M OT . BY M. P O S E T . P. R. F REQ . L. however [how, [ever UT adv g m 157 60498 7 very - UN a rf m 1218 8169 4 whereas [where, [as UT conj g m 1528 6236 7 aye - UN int g l 1800 5166 3 beneath - UN prep g o 1870 4917 7 behalf [be, [half UT n g m 2191 4012 6 374 In the lower frequency ranges, this effect may be counterbalanced by the expandability-reducing effect of word length. 375 If the adjective’s emphasizing element is also recognised in the adverb very, as in the construction your very own, one may even speak of a weak form of partial motivatability here. However, this meaning only comes third in LDOCE, and it is so different from the more common use of the adverb very that no attempt is made to stress this connection. 376 Be that as it may, beneath will be considered dissociated for the sake of systematicity. 377 The columns in Table 67 contain the lemma, the motivating constituents, the motivational code, the part of speech, the etymological origin, the period of first attestation, the rank, the frequency and the word length (in letters). British National Corpus 179 Even if the group of dissociated items is not very large, it deserves a systematic treatment in analogy to the other sections in this chapter. However, tables and charts will be mainly dispensed with. 4.1.9.1 Dissociation and frequency One of the six dissociated words has a rank higher than 1,000, namely the adverb however, which occupies the 157 th place in the list. Four items are situated between ranks 1,001 and 2,000, and behalf comes below rank 2,000. There is no particularly high concentration of these - admittedly very few - items in either the highest or lowest ranks of the sample. Therefore, one may be led to ask whether dissociation is frequency-independent, the hypothesis being that different factors play a dominant role in the consociation of words from different frequency ranges - expandability in the highest-frequency words and motivatability in the lower-frequency ones. Of course, the range of words analysed in the scope of the present research project is too restricted to allow representative conclusions for the whole of the English vocabulary, as the words treated here are all among the most frequent ones. Nonetheless, it is possible to differentiate even within this sample and to contrast the words of different frequency groups within this narrow band in order to determine whether there are any clearly discernible tendencies between the highest-frequeny and the lower-frequency words. If the hypothesis is right, the words belonging to ranks 1 to 100 should have a lower motivatability rate and a higher expandability rate than the average, while the items with ranks 2,401 to 2,500 should display a higher degree of motivatability and a lower degree of expandability than the average. Table 68: Motivatability of BNC ranks 1-100 and 2,401-2,500: in percentages Ranks 1-100 Ranks 2,401-2,500 Average MO - 18.00 13.48 MP 20.00 49.00 46.12 UT 1.00 4.00 4.40 UN 79.00 29.00 36.00 Table 69: Word-family integration of BNC ranks 1-100 and 2,401-2,500: in percentages Ranks 1-100 Ranks 2,401-2,500 Average B 20.00 67.00 58.76 M - - 0.88 E 80.00 33.00 40.12 N - - 0.24 Results 180 Table 68 and Table 69 provide evidence in favour of this assumption: the first 100 words have a joint motivational proportion of 20%. This is clearly below the average of 59.60%. By contrast, all of these words have an expansion in contrast to the average of 98.88%. Even if this very close result is based on an unrepresentative number of items, it suggests a tendency that is more clearly observable if only those items that are nothing but expandable are contrasted. In this respect, the highest-frequency ranges’ 80% are noticeably above the average of 40.12%. The figures for ranks 2,401 to 2,500 are also mainly in accordance with the hypotheses formulated above: the motivatability rate of 67% lies clearly above the average of 59.60%. However, the prediction that expandability should be lower than the average 98.88% is contradicted by a proportion of 100%, but again the figures are very close and based on a very small sample. 378 Furthermore, if only those words are considered that are nothing but expandable, the proportion of 33% lies considerably below the average of 40.12%, as expected. Therefore, it is possible to conclude from the combined results that the hypothesis formulated above is basically right and that high-frequency and lower-frequency words show a tendency to be dissociated on different grounds. 4.1.9.2 Dissociation and part of speech In terms of part of speech, one may expect the dissociated words to belong predominantly to the category of functional, closed-class words. Yet even if there are one conjunction and one preposition, most of the other words are not part of the closed-class items as defined by Quirk et al. (1985: 67): nouns and adjectives are lexical words, adverbs are classified as open-class words as well, and interjections such as aye are regarded as a part of speech belonging to neither the open nor the closed-class category. A superficial purely quantitative approach therefore suggests that dissociation is a feature applying mainly to lexical and less to grammatical words. However, a closer look at the items in question reveals that the dissociated words belonging to the open classes are very untypical representatives of them. Thus, however is one of those adverbs which are not formed on the basis of productive suffixes such as -ly, -wards or -wise, and whose number is unlikely to be complemented by new additions, so that this adverb might be almost considered a functional, closed-class item as well. 379 A similar line of argument applies to the adjective and the noun in 378 In this case, a difference of two items would lead to a result which is directly opposite. 379 However, “we must not exaggerate the extent to which word classes […] are ‘closed’: new prepositions (usually of the complex type ‘preposition + noun + preposition’ like by way of) continue to arise” (Quirk et al. 1985: 72). Even if the process is slower than in the open classes, new functional words can still be created. For instance, speakers in the United States have come to use the plural forms of the personal pronoun you British National Corpus 181 the group of dissociated words. Thus, the adjective very serves to determine the range of reference of the following noun: the very man could be paraphrased as (precisely) this man. In this sense, very is like a determiner in its semantic value - or rather in its function -, which also brings it closer to the category of closed-class functional items. Behalf is not a prototypical noun either, as it only seems to exist in constructions such as on behalf of sb. and on sb.’s behalf, which can usually be replaced by a prepositional phrase. 380 This can be interpreted in the sense that behalf has so little lexical meaning of its own that it is a noun only on formal grounds but that its actual meaning is more preposition-like. This line of argument is supported by the fact that the Comprehensive Grammar of the English Language designates the sequence on behalf of as a complex preposition (Quirk et al. 1985: 670). Faced with these idiosyncrasies, it is not really possible to make a generalising statement of the kind that dissociation is particularly prominent in openor closed-class items - based only on five dissociated items, it would also lack statistical backing. Nonetheless, it is possible to conclude that the few dissociated items tend to be either functional words or open-class items displaying an extremely strong similarity to functional words. 4.1.9.3 Dissociation and word length The dissociated items comprise 3, 4, 6, 7, 7 and 7 characters and thus have an average word length of 5.67 letters. This is slightly below the general average of 6.18, but the figure is based on so few items that it is not meaningful. 4.1.9.4 Dissociation and etymological origin Section 4.1.6.4 and Section 4.1.7.4 have already shown evidence that Romance English words are not only more likely to be motivatable than items with a Germanic origin, but that they are also more likely to be expandable into longer words. Given this, it comes as no surprise that only one of the six dissociated words is Romance, while the other five are of Germanic origin. Even if not too much importance should be attached to this ratio due to the low number of items involved in its calculation, one may still reach the conclusion that a foreign origin is no factor that can be made responsible for a word’s dissociation. you-all, y’all, you-uns, youse, you guys, youguys or youse guys (American Heritage Dictionary s.v. you-all at <http: / / www.bartleby.com/ 61/ 66/ Y0026600.html>; 24.10.2006) - and it is not completely out of the question that future generations of speakers of English may feel the need to distinguish between an inclusive and an exclusive use of the first person plural. 380 John’s wife accepted the prize on his behalf then becomes John’s wife accepted the prize for him, and Don’t worry on my behalf can be paraphrased as Don’t worry about me (examples adapted from OALD s.v. behalf). Results 182 4.1.9.5 Dissociation and period of origin Four of the six dissociated words date from the Middle English period. One of the remaining two items is attested in Old English, while the other one is a more recent addition to the language. 4.2 DWDS Core Corpus The results for the German analysis items are rendered in a structure that parallels the description of the English items. 4.2.1 Frequency The most frequent German word on the list is the definite article die, 381 which occurs 11,044,713 times in the DWDS Core Corpus. 382 The least frequent item on the list is Nahrung ‘food’ with 2,944 hits. The average frequency across all analysis items is 29,748. As in English, all associations with word frequency are calculated on the basis of the respective items’ rank. 4.2.2 Word length Table 70 lists the word length of the list items measured in terms of alphanumeric characters including hyphens. The German word length shows the distribution that was to be expected, with few very short words, reaching a maximum in the proportion of the six-letter-words, and then dropping steadily again. 383 It must be noted that the single one-letter item is not genuinely German, but actually a French loan word, namely à. While one may be led to assume that 18and 21-letter words should be perceived as extremely long and maybe almost unnaturally complex, a look at the actual items reveals that this is not so: selbstverständlich ‘natural’, Auseinandersetzung ‘discussion’, landwirtschaftlich ‘agricultural’ and sozialdemokratisch ‘social democratic’, which contain 18 letters each, are fairly inconspicuous words. That the adjective nationalsozialistisch ‘national socialist’, the longest word in the sample, comprises as many as 21 letters, is fairly surprising. 384 381 This extremely high figure may be partly attributed to the fact that die is not only the form of the female third person singular definite article, but also of the general plural definite article. 382 For reasons of reader-friendliness, DWDS is frequently used as an abbreviation for DWDS Core Corpus in the presentation of the results. 383 With the exception of a relatively high proportion of 18-letter words, which is however negligible due to the small number of tokens of the longest words. 384 The feeling that such words contain far fewer letters was confirmed in informal tests among native speakers of German, where estimates for the length of nationalsozialistisch and sozialdemokratisch ranged between 12 and 13 letters. DWDS Core Corpus 183 The items with a length of five to nine characters represent 67.32% of all analysed list words. The average word length is 7.22 letters. Table 70: Length of the DWDS analysis items in number of characters Number of characters Tokens % 1 1 0.04 2 27 1.08 3 90 3.60 4 247 9.88 5 322 12.88 6 404 16.16 7 339 13.56 8 332 13.28 9 286 11.44 10 180 7.20 11 125 5.00 12 68 2.72 13 38 1.52 14 21 0.84 15 9 0.36 16 5 0.20 17 1 0.04 18 4 0.16 21 1 0.04 There seems to be a general tendency for the categories of word length recognised here to approach a relatively stable proportion between 10% and 20%. This quasi-convergence, which points at a very equal distribution shortly before rank 2,500, can be contrasted with a greater variance in the proportion of the corresponding English categories. 385 However, in German as in English, there is a decrease in the shortest items with decreasing frequency, while the longer words’ proportion increases. 385 Cf. Section 4.1.2. Results 184 Figure 9: Word length and rank of the DWDS items 4.2.3 Morphology 4.2.3.1 Compounds Of the 2,500 German analysis items, 133 - i.e. 5.32% - are classified as compounds. The remaining 2,367 words are either monomorphemic, motivatable by affixes only, or supported by less easily describable forms of morphosemantic motivation. 4.2.3.2 Affixes The overwhelming majority of motivatable German list words contain affixes 386 of some kind. Table 71, Table 72, Table 73, Table 74 and Table 75 collect all the affix types involved in the motivational analyses together with an example word. 387 As mentioned in the methods section, the German affixes are far less well documented than their English counterparts. For this reason, both affixes occurring in the learner’s dictionary LGWDaF or in the larger UW were not marked in the data base - in contrast to the 386 When reference is made to German affixes, this may also include linking morphemes. 387 For the conventions used in those tables see Section 4.1.4.2. The plus sign in front of the additional sources grammis and canoo documents that the search was successful; in all other instances of *UW, it was not. The single affix Millithat only occurs in the unmotivatable but transparent Million is not admitted to the list. 0 500 1000 1500 2000 2500 0.0 0.1 0.2 0.3 0.4 0.5 Rank 1 to 4 5 6 7 8 9 10 or more DWDS Core Corpus 185 affixes set off by a star in the first column, which are recorded in neither of the two sources. The Institut für deutsche Sprache’s online grammatical dictionary grammis 388 as well as the site <www.canoo.net>’s collection of affixes 389 were then searched for such self-elaborated affixes. Table 71: German lexical prefixes and initial combining forms Item Example Occurrence in LGWDaF or UW Etymology Tokens ababgeben g 8 allallgemein g 2 anankommen g 19 aufaufbauen g 7 ausausgeben g 12 bebekämpfen g 48 beibeitragen g 2 herbeiherbeiführen *UW g- 1 *dardarstellen *LGWDaF *UW *grammis +canoo g 1 durch- (durchführen *UW g 1 eineinführen *UW g 13 Einzel- (einzeln *UW g 1 ententfernt g 10 ererleichtern g 37 ersterstmals g 1 festfesthalten *UW g 2 *ge- Gebrauch *LGWDaF *UW *grammis +canoo g 1 Ge- Gedanke g- 12 Gegengegenüber g 4 General- Generalsekretär r 1 Grund- Grundlage g 3 Haupt- Hauptstadt g 2 hervorhervorheben *UW g 3 388 This dictionary contains, among other things, information on the German affixes (cf. <http: / / hypermedia.ids-mannheim.de/ pls/ public/ gramwb.ansicht>, October 2006). 389 The software at <http: / / www.canoo.net> permits relatively sophisticated morphological analyses of German words. However, its problems in the treatment of partial motivatability - thus, Band ‘band’ is not related to binden ‘bind’ - prevents it from being used for the general analysis of the German list words. Furthermore, this would have resulted in a different treatment of English and German list words in the analyses. Results 186 hinhinweisen *UW g 4 hinter- Hintergrund *UW g 1 innerinnerhalb g 1 interinternational r 1 Kilo- Kilometer r 1 Ko- Kollege *LGWDaF r 1 *Kon- Kontakt *LGWDaF *UW +grammis r 4 mindestmindestens ~*UW g 2 Mit- Mitglied g 1 Mittel- Mittelpunkt *UW g 2 nach- Nachfolger g 2 Nach- Nachmittag g 1 Ober- Oberleutnant g 2 ober- (oben ~*UW g 1 Philo- Philosoph *LGWDaF r 1 Polit- Politik r 1 Psycho- Psychologie *LGWDaF r 1 re- (Reform r 1 Rück- (zurück *UW g 2 Sonder- (besonderg 1 Spezial- (speziell r 1 Tele- Telegramm *LGWDaF r 1 überüberschreiten g 8 umumgeben *UW g 3 ununbekannt g 8 Un- Unruhe *LGWDaF g 3 unter- (unterwerfen *UW g 3 Ur- Ursprung g 2 ververbessern g 50 vorvorlegen g 10 Vor- Vortrag g 2 voraus- (voraussetzen *UW g 1 wowoher *UW g 7 wor- (worauf *UW g- 1 zerzerstören g 1 zu- (Zukunft *UW g 1 zurückzurückziehen *UW g 3 zusammen- Zusammenarbeit *UW g 2 DWDS Core Corpus 187 Table 72: German grammatical prefixes Item Example Occurrence in LGWDaF or UW Etymology Tokens *gegeeignet *LGWDaF *Duden *grammis +canoo g 8 Table 73: German lexical suffixes and final combining forms Item Example Occurrence in LGWDaF or UW Etymology Tokens -al national r 10 -ant 1 Kommandant *LGWDaF r 2 *-ant 2 interessant *LGWDaF *UW +grammis r 1 *-är 1 Sekretär *LGWDaF *UW +grammis r 2 *-är 2 revolutionär *LGWDaF *UW +grammis r 2 -at Kandidat *UW r 3 -ation Organisation *LGWDaF r 5 -bar furchtbar g 2 -chen bisschen g 2 *-de Gebäude *LGWDaF *UW *grammis *canoo g- 2 *-e Blüte *LGWDaF *UW +grammis g 66 *-el Flügel *LGWDaF *UW +grammis g 1 *-el(n) lächeln *LGWDaF *UW +grammis g 1 -ell kulturell r 4 *-en golden *LGWDaF *UW +grammis g 1 Results 188 *-ens meistens *LGWDaF *UW +grammis g- 5 *-ent Präsident *LGWDaF *UW +grammis r 2 *-enz Existenz *LGWDaF *UW +grammis r 3 -er Arbeiter r 35 -erei Malerei g 1 *-er(n) schildern *LGWDaF *UW *grammis *canoo g- 9 -fach mehrfach g 2 *-halb außerhalb *LGWDaF *UW *grammis *canoo g- 1 -haft ernsthaft g 2 -heit Einzelheit g 12 *-ie Philosophie *LGWDaF *UW +grammis r 4 *-iell finanziell *LGWDaF *UW +grammis r 2 *-ier Offizier *LGWDaF *UW *grammis +canoo r 1 -ier(en) definieren r 8 -ig geistig g 59 -ig(en) beschäftigen g 6 -igkeit Gerechtigkeit *LGWDaF g 1 -ik Technik *LGWDaF r 4 -ion Produktion *LGWDaF r 8 -isch künslerisch g 33 -isier(en) organisieren r 1 -ismus Sozialismus r 1 -ist Kommunist r 1 *-istisch kapitalistisch *LGWDaF *UW *grammis +canoo b- 3 -ität Aktivität r 5 DWDS Core Corpus 189 *-ition Opposition *LGWDaF *UW +grammis r 2 -iv subjektiv r 8 -keit Öffentlichkeit g 11 -kratie Demokratie *LGWDaF r 1 *-kunft Herkunft *LGWDaF *UW *grammis *canoo g- 1 -lei keinerlei r 1 -lein Fräulein g 1 -ler Ermittler g 1 -lich königlich g 88 -logie Psychologie *LGWDaF r 1 -los zweifellos g 1 -mäßig regelmäßig g 2 -nahme (Stellungnahme *LGWDaF g 2 -nis Ergebnis g 13 *-or Autor *LGWDaF *UW +grammis r 7 -ös religiös *UW r- 1 -reich erfolgreich g 3 *-s abends *LGWDaF *UW ~+grammis g- 17 *-sal Schicksal *LGWDaF *UW +grammis g 1 -sam wirksam *LGWDaF g 7 -schaft Freundschaft g 8 -seits einerseits *LGWDaF g 2 -sekretär Generalsekretär ~*UW r 1 *-t Fahrt *LGWDaF *UW +grammis g 11 *-tion Reaktion *LGWDaF *UW *grammis *canoo r- 2 -tum Eigentum g 4 *-uell aktuell *LGWDaF *UW +grammis r- 1 Results 190 *-um Zentrum *LGWDaF *UW *grammis *canoo r- 4 *-ung Sammlung *LGWDaF *UW +grammis g 143 -voll wertvoll g 1 -wärts vorwärts *LGWDaF g 1 -weise beispielsweise g 2 -zeug Flugzeug g 2 Table 74: German grammatical suffixes Item Example Occurrence in LGWDaF or UW Etymology Tokens *-e 1 Abgeordnete *LGWDaF *UW g- 35 *-e 2 welche *LGWDaF *UW g- 14 *-e 3 Leute *LGWDaF *UW g- 7 *-e 4 bitte *LGWDaF *UW g- 1 *-em welchem *LGWDaF *UW g- 1 *-en 1 anderen *LGWDaF *UW g- 4 *-en 2 geschlossen *LGWDaF *UW g- 3 *-end entsprechend *LGWDaF *UW +grammis g 23 *-er 1 später *LGWDaF *UW g- 8 *-er 2 welcher *LGWDaF *UW g- 5 *-in Freundin *LGWDaF *UW +grammis g 2 *-st möglichst *LGWDaF *UW g- 8 *-t geeignet *LGWDaF *UW g- 20 *-n Eltern *LGWDaF *UW g- 3 DWDS Core Corpus 191 Table 75: German interfixes Item Example Occurrence in LGWDaF or UW Etymology Tokens *-en- Christentum *LGWDaF *UW g- 1 *-es- Bundestag *LGWDaF *UW g- 5 *-n- Augenblick *LGWDaF *UW g- 1 *-s- Geburtstag *LGWDaF *UW g- 8 The total number of affix tokens amounts to 1,150. Two affixes are recorded for 40 list words in spite of the usually binary analysis. 390 In total, then, 1,110 words are motivatable by at least one of the 153 affix types that can be found in the tables above. There are 62 prefix types, 87 suffix types and four linking morpheme types. The suffixes also reach the highest figures as far as the tokens are concerned. Lexical suffixes constitute the largest category with a total of 73 types and 664 tokens. The only grammatical prefix, ge-, occurs eight times in the list words, and the four linking morphemes only occur 15 times in total. The most frequent prefix is verwith 50 tokens, closely followed by be-, which occurs 48 times. 26 prefixes occur only once. Of the suffixes, lexical -ung is the most frequent one, even though it is not recorded in the dictionaries under consideration. The second most frequent suffix is -lich with 88 instances. In spite of these large numbers, 24 lexical suffixes are only recognised in a single item on the analysis list. Table 76: German affix types and tokens Types Tokens Tokentype-ratio Prefixes total 62 337 5.44 lexical 61 329 5.39 grammatical 1 8 8.00 Suffixes total 87 798 9.17 lexical 73 664 9.10 grammatical 14 134 9.57 Linking morphemes total 4 15 3.75 390 Those words are Anforderung, ausgezeichnet, Beamte, beispielsweise, berühmt, beschäftigen, beseitigen, bestätigen, beteiligen, Christentum, Delegierte, eigentümlich, eingehend, erörtern, erstmals, Gebäude, gebildet, Gedanke, geeignet, gegenseitig, genannt, Generalsekretär, Geschäft, geschlossen, Gesichtspunkt, geworden, hinaus, hinein, letztere, mehrere, meiste, mindestens, Nachfolger, Oberst, Politik, Psychologie, umgekehrt, verlieren, vorläufig and wenigstens. Results 192 The token-type-ratios of the lexical and the grammatical suffixes are relatively similar, both being re-used on average between nine and ten times in the list words. By contrast, prefixes and linking morphemes have increasingly lower token-type ratios. Altogether, 49 of the German affix types and 458 of the affix tokens are recorded in neither of the two reference works. This corresponds to 32.03% of all affix types and 39.83% of all affix tokens - figures that are shockingly large, particularly if it is borne in mind that the most frequent of all affixes belongs to this group. As 22 of the so-called self-elaborated affixes can be found in grammis, and five of the affixes that were not retrievable anywhere else are recorded in canoo, this can be seen as indicative that the selfelaborated affixes are not subjective creations of the analyst, but fairly established. The fact that the other 22 affixes, most of them grammatical, were not found in any dictionary-like reference works, evokes the impression that grammatical affixes are considered to belong to the domain of grammar, but not to that of the lexicographical description of the language. Interestingly, it is not the Universalwörterbuch that contains more of the affixes that were needed in the analyses, but the learner’s dictionary Langenscheidt Großwörterbuch Deutsch als Fremdsprache. While 49 affixes are not recorded in either of the dictionaries, 22 more are missing from the UW, but only 16 additional affixes cannot be found in LGWDaF. These results indicate that affixes only play a minor role in German lexicography aimed at native speakers, but that the awareness of their importance increases in learner lexicography. 4.2.4 Etymological origin With merely 16 subcategories, the table recording the etymological origin of the DWDS items is considerably shorter than its English equivalent - which can be ascribed to the fact that no internal distinction is made between the words of Romance origin - with the exception of Greek. 391 Furthermore, there are no eponymic items among the 2,500 DWDS analysis words. With Germanic lexemes accounting for almost 81% of the total, it is very evident from the data that German is a Germanic language - a fact that hardly anyone would have called into question. Nonetheless, about onesixth of its 2,500 most frequent items are of Romance origin, to which another 4% of Germanic-and-Romance words can be added. The two remaining minor categories of excluded words with a more exotic origin and of words of unknown origin together constitute less than 1% and can therefore be disregarded. What is interesting, though, is the fact that of the 464 words involving either only Romance constituents or containing at least 391 Cf. Section 3.5.4 for the inclusion of Greek among the Romance languages. DWDS Core Corpus 193 one Romance element, 96 have their origin in the Greek language. This corresponds to 3.84% of the total list words and 20.69% of the Romance items - a proportion that is higher than one may have expected. Table 77: Etymological origin of the DWDS items C ATEGORY C ODE T OKENS % Germanic total 2022 80.88 g 1435 57.40 g- 586 23.44 gRG 1 0.04 Romance total 364 14.56 r 269 10.76 r- 11 0.44 rGR 6 0.24 rh 1 0.04 rk 74 2.96 rk- 2 0.08 rkh 1 0.04 Germanic-and-Romance total 100 4.00 b 1 0.04 b- 80 3.20 bk- 19 0.76 Excluded total 13 0.52 e 11 0.44 e- 2 0.08 Unknown total 1 0.04 uh 1 0.04 It has been mentioned before that the etymological description of the German words in the relevant dictionaries is far from complete. Table 78 indicates that no etymological information could be found for 700, i.e. 28%, of the list items. Of the cases with a self-elaborated etymology, over 83% are represented by the Germanic items alone. 392 While these figures may at first sight seem too large to permit any reasonably valid calculations, one must not forget that those self-elaborated etymologies are not made up in an adhoc manner - usually, they emerge from the combination of the etymologies for the relevant base and the discernible affixes, e.g. in the case of Überraschung ‘surprise’ from überraschen ‘to surprise’ and -ung, or they are based on a related word within the same word family, e.g. in the case of ausdrücken ‘to express’/ Ausdruck ‘expression’. Fairly general agreement can 392 The high proportion of items with a mixed origin can be explained by the fact that some words are identified as Germanic in the dictionaries, even though they actually contain a Romance affix. For the sake of the original-language approach (cf. Section 3.5.4), such words were re-interpreted as b-. Results 194 be expected in most of the etymologies marked with a minus sign, and the marking is basically done for reasons of systematicity. Table 78: Self-elaborated German etymologies Code Tokens % g- 586 83.71 r- 11 1.57 rk- 2 0.29 b- 80 11.43 bk- 19 2.71 e- 2 0.29 Total 700 28.00 Another observation is that several words on the list have entered German via English, e.g. Partner, Radio, Lord and Sir, but their number is not very large. The most surprising anglicisms are the functional words the and of, which occupy ranks 506 and 653 respectively. This can be explained by the large number of English book titles and quotations in the DWDS Core Corpus. As far as the etymological origin of the affixes is concerned, there are 109 Germanic, 43 Romance and 1 Germanic-and-Romance affix types. If the actual tokens are counted, the order is the same, but the difference between the Germanic and the Romance items is more pronounced: there are 1,000 Germanic, 147 Romance and 3 Germanic-and-Romance affix tokens. Table 79 shows the distribution of affix types and tokens in relation to the words’ etymology. Table 79: Etymological distribution of the German affix types P REFIXES S UFFIXES L INK . M ORPH . L EXICAL G RAMM . L EXICAL G RAMM . Typ. Tok. Typ. Tok. Typ. Tok. Typ. Tok. Typ. Tok. Germ. 50 315 1 8 40 528 14 134 4 15 Rom. 11 14 - - 32 133 - - - - G.-a.-R. - - - - 1 3 - - - - It immediately becomes apparent that all grammatical affixes, be they prefixes, suffixes or linking morphemes, are of Germanic origin. Therefore, only the lexical categories can be compared with respect to their Germanic or Romance origin. Here, the Germanic affixes yield higher results than the Romance affixes both in terms of the number of types and in the number of tokens of prefixes and suffixes. DWDS Core Corpus 195 Table 80: Etymology and token-type-ratio of the German affixes Lex. prefixes Gramm. prefixes Lex. suffixes Gramm. suffixes Link. morph. Germ. 6.30 8.00 13.20 9.57 3.75 Rom. 1.27 - 4.16 - - G.-a.-R. - - 3.00 - - The association of the token-type-ratios with the respective etymologies suggests that the Germanic lexical suffixes are used in the production of the largest number of tokens if the predominantly binary analysis is used. In addition, not only are the Germanic lexical suffixes more productive than the Romance ones, but the Germanic lexical prefixes are also more productive than their Romance equivalents. Summing up, the Germanic element is the more productive one in the German language as measured here. Figure 10: Etymological origin and rank of the DWDS items Of the 1,110 German words containing an affix, 880 are classified as Germanic, 141 as Romance, 84 as Germanic-and-Romance, four as excluded, and one has an unknown origin. Knowing that 40 words contain two affixes, and knowing that the 147 Romance affix tokens appear in 141 words, it is possible to conclude that the Romance affixes usually combine with a 0 500 1000 1500 2000 2500 0.0 0.2 0.4 0.6 0.8 1.0 Rank g r b Results 196 Romance base because otherwise the respective words would have been marked as Germanic-and-Romance. By contrast, there are 1,000 Germanic affix tokens, but only 880 Germanic words containing affixes. As 35 of the two-affix-words contain two Germanic affixes, 393 and as the exotic and the unknown items contain a total of 6 Germanic affixes, 79 Germanic affix tokens must have combined with a Romance or mixed base to form complex words of the etymological category Germanic-and-Romance. In conclusion, while the Romance German affixes tend to attach to Romance bases only, Germanic affixes are most likely to combine with a Germanic base, but Romance or mixed bases are also a possible option. One can take Figure 10 to conclude that German is a predominantly Germanic language irrespective of its words’ frequency, but that the Romance and the mixed Germanic-and-Romance element increases with decreasing rank/ frequency, if only slightly. 4.2.5 Motivatability Table 81 lists all the detailed motivational codes encountered in the analysis of the German words. Table 81: The detailed motivational codes of the DWDS items C ATEGORY C ODE T OKENS % Full motivatability total 442 17.68 MO 212 8.48 MOg 78 3.12 MOl 147 5.88 MOS 4 0.16 MOSl 1 0.04 Partial motivatability total 1221 48.84 MP# 286 11.44 MP# l 1 0.04 MP# R 11 0.44 MP# RT 59 2.36 MPA 30 1.20 MPA# 1 0.04 MPA# RT 3 0.12 MPACRT 1 0.04 MPAg 6 0.24 MPAgRT 2 0.08 MPAl 24 0.96 MPAlRT 13 0.52 MPART 56 2.24 MPCg 1 0.04 393 The remaining five two-affix words are Romance-Romance Generalsekretär, Politik and Psychologie, and the mixed Germanic-and-Romance Nachfolger and verlieren. DWDS Core Corpus 197 MPCg# 1 0.04 MPCRT 14 0.56 MPF 82 3.28 MPF# 42 1.68 MPF# R 3 0.12 MPF# RT 5 0.20 MPFC 1 0.04 MPFg 5 0.20 MPFg# 7 0.28 MPFgR 1 0.04 MPFgRT 1 0.04 MPFL 10 0.40 MPFl 28 1.12 MPFL# 3 0.12 MPFl# 9 0.36 MPFL# R 1 0.04 MPFl# R 2 0.08 MPFL# RT 1 0.04 MPFLI 3 0.12 MPFLI# R 1 0.04 MPFLIl 3 0.12 MPFLIl# 1 0.04 MPFLl 3 0.12 MPFLl# 1 0.04 MPFLR 1 0.04 MPFR 16 0.64 MPFRT 4 0.16 MPFS# 1 0.04 MPFS# RT 1 0.04 MPFSCR 1 0.04 MPFSg# 1 0.04 MPFSL# R 1 0.04 MPFSR 2 0.08 MPFSRT 1 0.04 MPFV 38 1.52 MPFV# 20 0.80 MPFVg 2 0.08 MPFVg# 2 0.08 MPFVg# RT 1 0.04 MPFVL 1 0.04 MPFVl 16 0.64 MPFVl# 1 0.04 MPFVR 1 0.04 MPFVU 49 1.96 MPFVU# 23 0.92 MPFVU# RT 1 0.04 MPFVUg 6 0.24 Results 198 MPFVUg# 3 0.12 MPFVUL 1 0.04 MPFVUl 2 0.08 MPFVUL# 1 0.04 MPFVUR 1 0.04 MPFWg# 1 0.04 MPFWl 1 0.04 MPg# 29 1.16 MPgRT 1 0.04 MPL 14 0.56 MPL# 19 0.76 MPl# 23 0.92 MPL# RT 1 0.04 MPl# RT 1 0.04 MPLCR 1 0.04 MPLCRT 1 0.04 MPLg 1 0.04 MPLI 6 0.24 MPLI# 4 0.16 MPLI# RT 1 0.04 MPLIg# 1 0.04 MPLIl 9 0.36 MPLIl# 1 0.04 MPLl 4 0.16 MPLl# 3 0.12 MPlR 1 0.04 MPLRT 4 0.16 MPlRT 1 0.04 MPR 11 0.44 MPRT 35 1.40 MPZ 99 3.96 MPZ# 5 0.20 MPZ# T 3 0.12 MPZ# T? 2 0.08 MPZFS 1 0.04 MPZFS# T 1 0.04 MPZLIT? 1 0.04 MPZT 11 0.44 MPZT? 4 0.16 No motivatability total 704 28.16 UN 704 28.16 No mot., but transparency total 133 5.32 UT 74 2.96 UT? 59 2.36 These results can be summarised as follows: DWDS Core Corpus 199 Table 82: Motivatability of the DWDS items: simplified table C ATEGORY C ODE T OKENS % Full motivatability MO 442 17.68 M 66.52% Partial motivatability MP 1221 48.84 No motivatability, but transparency UT 133 5.32 U 33.48% No motivatability UN 704 28.16 33.48% of the 2,500 list words are completely dissociated from the analytical point of view. In 133 cases, the item in question can be related to other German words on a purely formal basis. This corresponds to 5.32% of the total list words and to 15.89% of the 837 unmotivatable words. 8.48% of the words analysed are completely motivatable without any kind of restriction. To this can be added 9.20% cases with minor restrictions such as affixes that are not contained in the dictionaries used here or formal obstacles which are confined to the spoken modality. Altogether, 17.68% of the 2,500 words are completely motivatable according to the definition laid out in Section 3.3.2.3. In addition, 48.84% of the items belong to the category of partial motivatability. If all words that are motivatable in some way or another are counted together, almost precisely two-thirds of the German sample words are consociated in the analytical direction. Table 83 presents the simplified figures for the variables of partial motivatability. 394 The bold percentages relate to the number of total list items, while the unmarked figures represent the category-internal proportion. 395 The most important factor obscuring the analysis of partially motivatable German words is semantic difference, which applies to 23.56% of the list items. The second most frequent feature is formal in nature, at 16.60%, followed by incomplete analysability and self-elaborated affixes, which represent an obstacle in 12.96% and 8.80% of the instances respectively. The picture that emerges is therefore one in which one main feature plus three more or less similarly frequent secondary features are responsible for partial motivatability in German. 394 As in the case of the English words, the figures are based on all codes with the exception of the category MO. Among the incompletely analysable items, the code under consideration is RT - and not simply T -, so that T-codes occurring in MPZ words are not included. FV and FVU are calculated separately, so that none contains the other. 395 The category of incomplete analysability is special in that the sum of the items in the rows below - in this case, 399 - is not identical with the result indicated under total, which is 324. This is due to the fact that 75 items contain both the code A and the code RT. Consequently, instances of MPA# RT (3x), MPACRT (1x), MPAgRT (2x), MPAlRT (13x) and MPART (56x) must be counted only once. The category-internal percentages relate to the 399 instances, though. Results 200 Table 83: Partial motivatability of the DWDS items: simplified table C ATEGORY C ODE T OKENS % Formal differences total 415 16.60 F 234 56.39 FS 10 2.41 FW 2 0.48 FV 82 19.76 FVU 87 20.96 Semantic differences total # 589 23.56 Incomplete analysability total 324 12.96 R 55 13.78 RT 208 52.13 A 136 34.09 Marked constituents total 103 4.12 L 72 69.90 LI 31 30.10 Motivatability by a clipping or a formally related shorter synonym total C 21 0.84 Motivatability by a grammatical polyseme total Z 127 5.08 Self-elaborated affixes total 220 8.80 g 72 32.73 l 148 67.27 When discussing the role of semantic restrictions, it must not be forgotten that words with very doubtful constituents were excluded from the beginning, so that semantic difficulties may actually be present in even more than 23.56% of the German words. About 56% of the formally difficult words involve an obstacle in both the spoken and the written variety. The second most important subcategory is made up of words with vowel mutation - almost immediately followed by vowel gradation differences between complex words and their constituents. 396 Incomplete analysability unites three different subtypes: only about 14% of the affected words feature a free base and a remainder that cannot be connected to any other lexical element. Far more frequently, the unmotivatable remainder formally resembles another word or word formation element. With more than 50% of incompletely analysable items containing a transparent remainder, the category RT must not be neglected in a de- 396 The 10 items with pronunciation obstacles only can be supplemented by five completely motivatable words with the same feature, but no fully motivatable items join the two words with spelling difficulties only. DWDS Core Corpus 201 scription of the motivatability of the German vocabulary 397 - nor must the fact that 136 of the 2,500 words - i.e. 5.44% - contain one or two affixes, but no synchronically discernible base. Almost 9% of the words contain self-elaborated affixes. 398 Motivatability by a grammatical polyseme only plays a minor role, but as this type of motivation was only admitted if no other motivating elements could be found, the impact as such should actually be larger. Most of the 4.12% words with marked constituents are labelled by the dictionaries, and only 31 words are marked with the code LI for self-elaborated labels. Motivatability by clippings, which occurs in less than 1% of the list items, can be practically disregarded. 4.2.5.1 Motivatability and frequency Figure 11 shows the association between the words’ rank and the four motivational categories. Figure 11: Motivatability and rank of the DWDS items 397 Again, there are more cases where transparent semantically unrelated elements combine with truly motivating elements (208x) than words that are purely transparent but unmotivatable (133x). 398 To these can be added 148 MO items with a self-elaborated lexical affix and 78 with an unrecorded grammatical affix. 0 500 1000 1500 2000 2500 0.0 0.2 0.4 0.6 0.8 1.0 Rank MP UN MO UT Results 202 The most frequent items are predominantly unmotivatable, but the proportion of these completely unmotivatable words experiences a relatively sharp drop in the first 500 items. The proportion of unmotivatable but transparent words rises very slightly and then goes down again to remain relatively stable. By contrast, both partial and full motivatability show a very clear rise, the curve of partial motivatability being much steeper with the exception of a few MOg-outliers in the highest-frequency ranges, 399 but after about rank 1,000, both graphs rise almost parallel to each other. In Figure 12, the four categories mentioned above are reduced to motivatability and opacity only. The point where the two lines cross - just under rank 800 - indicates the change from a predominantly unmotivatable to a predominantly motivatable vocabulary. The steepness of the curve in Figure 12 suggests that lower-frequency words should be more motivatable than high-frequency words; however, the rise is only gradual, so that only extreme lower-frequency words would result in marked differences to the present study’s results. Figure 12: Motivatability and rank of the DWDS items: simplified version 399 Thus, the first completely motivatable item is eine ‘a’, which can be subdivided into the base ein and the grammatical feminine ending -e. 0 500 1000 1500 2000 2500 0.0 0.2 0.4 0.6 0.8 1.0 Rank M U DWDS Core Corpus 203 4.2.5.2 Motivatability and word length Table 84: Word length and motivatability of the DWDS items: in percentages MO MP U Tokens 1 - - 100.00 1 2 - 14.81 85.19 27 3 - 22.22 77.78 90 4 4.45 19.43 76.11 247 5 8.39 33.23 58.39 322 6 9.16 47.03 43.81 404 7 14.75 58.41 26.84 339 8 19.88 63.86 16.27 332 9 26.92 65.03 8.04 286 10 33.33 57.78 8.89 180 11 40.00 56.00 4.00 125 12 39.71 58.82 1.47 68 13 34.21 65.79 - 38 14 42.86 57.14 - 21 15 77.78 22.22 - 9 16 80.00 20.00 - 5 17 100.00 - - 1 18 75.00 25.00 - 4 21 - 100.00 - 1 In analogy to the results for English, the shortest words are usually unmotivatable, while the longest words are generally motivatable - those containing more than twelve letters even to 100%. The dark cells in Table 84 illustrate the shift from the absence of motivatability to partial motivatability and then to full motivatability - a linear trend that is only slightly blurred by some minor jumps, particularly in categories with extremely few members. In this respect, English and German words behave very similarly. In both languages, words consisting of up to five letters are predominantly unmotivatable in contrast to the items of six letters and more. However, the predominance of full motivatability in German sets in later than in English, namely in 15rather than in 12-letter words - though it must be added that the swing is far more dramatic numerically than in English. Furthermore, the complete lack of unmotivatable items in English sets in earlier than in German, namely with twelve and thirteen letters respectively. 4.2.5.3 Motivatability and etymological origin Table 85 indicates that the motivatability of the DWDS items is intrinsically linked to their etymological origin: while more than two-thirds of the Germanic words are motivatable, the Romance items’ motivatability is slightly Results 204 below 50%. By contrast, the words with a Germanic-and-Romance origin are even motivatable by as much as 100% - for the same reasons as those laid out in the corresponding English Section, 4.1.6.4. Table 85: Simplified motivatability and etymological origin of the DWDS items: in percentages M U Tokens Germanic 68.25 31.75 2022 Romance 48.90 51.10 364 Germanic-and-Romance 100.00 - 100 Excluded 30.76 69.23 13 Unknown 100.00 - 1 The more detailed results below demonstrate that full motivatability is also highest in the category of mixed origin and that the Germanic words beat the Romance words with respect to their proportion both in the category of full and of partial motivatability. The difference between the motivational behaviour of the Germanic and the Romance words is highly significant (p < 0.00001) according to a Chi square test. This is mainly due to differences in full motivatability and the proportion of completely unmotivatable items. Of the thirteen words with a more exotic provenience, which are classified as excluded, two-thirds are not motivatable. Table 86: Motivatability and etymological origin of the DWDS items: in percentages MO MP UT UN Tokens Germanic 18.60 49.65 5.64 26.11 2022 Romance 4.67 44.23 4.67 46.43 364 G.-and-R. 47.00 53.00 - - 100 Excluded 15.38 15.38 15.38 53.85 13 Unknown - 100.00 - - 1 As no Romance subcategories were distinguished in the German data, French and Latin influences cannot be measured. However, words of Greek origin were encoded. Table 87: Motivatability of the DWDS items with Greek origin C ODE T OKENS % Full motivatability MO 7 7.29 M 51.04% Partial motivatability MP 42 43.75 No motivatability, but transparency UT 4 4.17 U 48.96% No motivatability UN 43 44.79 DWDS Core Corpus 205 Table 87 shows the expected result: the words with Greek origin are clearly less motivatable than the average of 66.52% and also less transparent than the sample words in general. With a motivatability rate of 51.04%, the difference to the motivatability of the Romance words without the Greek words, 51.57%, is only minimal. In analogy to English, the provenience of the 415 partially motivatable items with formal difficulties was tested. With 23 Germanic-and-Romance, 73 Romance and 319 Germanic items, the proportion of the Romance items in this subcategory - 17.59% - is higher than the average proportion of 14.56% Romance words. Consequently, it seems that from a formal point of view motivatability is easier when Germanic constituents are involved. 4.2.6 Expandability By definition, items that are consociated in both directions and merely expandable items, both without further restriction, count towards full expandability, while merely motivatable and dissociated items are classified as unexpandable. The remaining items belong to the category of partial expandability. Table 88 and the summary in Table 89 reveal that practically all German items are expandable in some way or another; 88.36% of them fully and 8.12% partially. Only 88 of the 2,500 analysis words from the DWDS Core Corpus, i.e. 3.52%, cannot be expanded into a longer word. Even if expandability with its 33 subcategories is less extensive than motivatability, it is advisable to summarise the types of obstacles occurring in the expansions. The bold percentages in Table 90 relate to the number of total list items, while the unmarked figures represent the category-internal proportion. The fact that semantic differences are so infrequent can of course be explained by the fact that great care was taken to find the most suitable expansion for each word. Grammatical polysemy and longer synonyms only play a minor role as well. By contrast, formal differences are relatively frequent, as the number of 46 items demonstrates. Within this category, simultaneous changes in spelling and pronunciation are the most frequent single category, but if FV and FVU are considered jointly, the changes involving vowels constitute almost half of all formal difficulties. The most important kind of obstacle pertains to stylistic markedness, though. 5.64% of all expanded items do not belong to the core vocabulary, but the 141 marked expansions are not all equally strongly marked. Thus, 8 items bear the same diasystematic label as their original base. The 77 items labelled with the diasystematic codes l or L can be assigned to the categories in Table 91. Results 206 Table 88: Expandability of the DWDS items C ATEGORY C ODE T OKENS % Full expandability total 2209 88.36 B 1434 57.36 E 775 31.00 Partial expandability total 203 8.12 B+C 10 0.40 B+F 11 0.44 B+FL 1 0.04 B+FS 6 0.24 B+FV 12 0.48 B+FVL 1 0.04 B+FVU 6 0.24 B+L 38 1.52 B+l 12 0.48 B+LC 2 0.08 B+lC 1 0.04 B+LI 47 1.88 B+lI 5 0.20 B+LIC 1 0.04 B+Z 5 0.20 E+# 1 0.04 E+C 3 0.12 E+F 3 0.12 E+FS 2 0.08 E+FSlI 1 0.04 E+FV 1 0.04 E+FVU 2 0.08 E+L 11 0.44 E+l 8 0.32 E+L# 2 0.08 E+LC 1 0.04 E+LI 7 0.28 E+lI 2 0.08 E+LIC 1 0.04 No expandability total 88 3.52 M 67 2.68 N 21 0.84 Table 89: Expandability of the DWDS items: simplified table C ATEGORY T OKENS % Full expandability 2209 88.36 96.48% Partial expandability 203 8.12 No expandability 88 3.52 3.52% DWDS Core Corpus 207 Table 90: Partial expandability of the DWDS items: simplified table C ATEGORY C ODE T OKENS % Formal differences total 46 1.84 F 15 32.61 FS 9 19.57 FW 0 0.00 FV 14 30.43 FVU 8 17.39 Semantic differences total # 3 0.12 Marked expansions total 141 5.64 l 21 14.89 L 56 39.72 lI 8 5.67 LI 56 39.72 Expandability by a formally related longer synonym total C 19 0.76 Expandability by a grammatical polyseme total Z 5 0.20 Table 91: Diasystematic labels of the DWDS items Label Translation Tokens geschr written 32 geh./ bildungsspr. formal 7 gespr spoken 7 selten/ seltener rare 4 veraltend obsolescent 4 ugs./ fam. informal 3 dichter. literary 1 klass. Lit. classical literature 1 domains 25 Amtsdeutsch/ Amtsspr./ Admin/ Papierdt administrative 5 Mil. military 5 Ökon./ Wirtsch economy 5 Jur/ Rechtsw./ Rechtsspr. legal 4 Geogr. geography 1 Kfz.-T. automobile mechanics 1 Med medicine 1 Politik politics 1 Sprachw. linguistics 1 Versicherungsw. insurance business 1 Most of the marked German expansions are on a high stylistic level, regardless of whether they are marked as written or formal. Among the do- Results 208 mains, the fields of administration, the military and economics are all equally important - in contrast to English, where computing and finance dominate. Seven of the German expansions are assigned two labels by the dictionaries, namely: alsogleich geh. veraltend diesbezüglich Admin geschr Fehlbetrag Admin geschr In-sich-Geschäft Rechtsw., Wirtsch. Kosten-Nutzen-Analyse Wirtsch., Politik Nämlichkeit Amtsspr., selten Schadensfeststellung Rechtsspr., Versicherungsw. In analogy to the English distinction between markedly formal and markedly informal items, the dichotomy spoken/ written is important numerically. However, no German expansions are marked as dialectal. 4.2.6.1 Expandability and frequency In analogy to English, the association of the German words’ expandability with their corresponding rank will not be illustrated graphically, as little would be gained. Similarly, the constantly high proportion of expandability in German allows the conclusion that the lack of expandability should be a characteristic of extremely low-frequency words. 4.2.6.2 Expandability and source size Table 92: Expandability of the DWDS items according to their sources Tokens % LGWDaF 2055 82.20 UW 163 6.52 GWDS 46 1.84 DWDS 147 5.88 IDS 49 1.96 - 40 1.60 For the vast majority of the words, an expansion can be found within the learner’s dictionary. The decreasing proportion of expansions from the Universalwörterbuch and the Großes Wörterbuch der deutschen Sprache is also in line with the analyst’s expectations. By contrast, the increase in expansions from the DWDS Core Corpus was not foreseen. As this may be due to the large number of ad-hoc formations in the corpus, there is a clear need to investigate the frequency of the DWDS expansions. DWDS Core Corpus 209 Table 93: Frequency of the DWDS expansions DWDS freq. Number of DWDS exp. with this freq. 1 59 2 24 3 13 4 6 5 3 6 5 7 3 8 1 9 5 10 1 11 3 12 2 13 3 15 1 19 1 20 1 22 1 23 1 25 1 27 1 31 1 33 1 34 1 38 1 50 1 51 1 58 1 75 1 95 1 193 1 242 1 754 1 With 59 tokens, 40.14% of the DWDS expansions occur only once, 400 whereas at the other end of the scale, there are 12 words with a frequency of more than 30 in the DWDS Core Corpus which are missing in the dictionaries. 401 400 Of those items, 13 are fully motivatable, 30 partially motivatable, 8 completely unmotivatable and 8 unmotivatable but transparent. 401 In order of increasing frequency, those are Man-selbst, Sonstiges, Befassung, Miteinandersein, Wiedererrichtung, irgendwelchem, Einzelindividuum, Fliegeroberleutnant, Staatssozialismus, Zurückziehung, Herbeiführung and irgendwelchen. However, these words are Results 210 With the IDS Corpora as the largest publicly accessible edited corpus of German available, the words for which no expansion could be found in the DWDS Core Corpus were subjected to a further search. For 49 of those 89 items, longer words containing the item could be found, but this was not considered in the results of the analysis. Of these expansions, the words schaffbar, Who-is-Who and tautological bekanntlicherweise are particularly remarkable, as they occur 41, 44 and 52 times respectively among the corpus’ 1,527 million word forms and may even merit inclusion in the dictionaries. 4.2.6.3 Expandability and word length Table 94: Word length and expandability of the DWDS items: in percentages Full exp. Partial exp. No exp. Tokens 1 100.00 - - 1 2 85.19 11.11 3.70 27 3 92.22 6.67 1.11 90 4 92.31 5.67 2.02 247 5 91.61 5.59 2.80 322 6 93.56 3.47 2.97 404 7 87.91 9.44 2.65 339 8 83.43 12.05 4.52 332 9 82.87 12.24 4.90 286 10 88.33 5.00 6.67 180 11 91.20 5.60 3.20 125 12 79.41 16.18 4.41 68 13 81.58 18.42 - 38 14 80.95 9.52 9.52 21 15 66.67 22.22 11.11 9 16 100.00 - - 5 17 - 100.00 - 1 18 50.00 50.00 - 4 21 100.00 - - 1 Excluding the one-letter word and the items with a length of 15 characters and over for statistical reasons, the figures for synthetic word-family integration oscillate between 66.67% and 100% for full expandability and roughly 3% to 22% for partial expandability. On the whole, the majority of the words in the German sample are not only expandable, but are even fully so, but it seems that fewer words tend to be expandable with growing certainly not central enough to the vocabulary to recommend their inclusion as lemmas in the dictionaries. DWDS Core Corpus 211 word length - even if the high variability makes the trends less clear than in English. 4.2.6.4 Expandability and etymological origin Table 95: Expandability and etymological origin of the DWDS items: in percentages Full exp. Partial exp. No exp. Tokens Germanic 88.13 7.72 4.15 2022 Romance 90.93 8.79 0.27 364 Germ.-and-Rom. 83.00 14.00 3.00 100 Excluded 92.31 7.69 - 13 Unknown 100.00 - - 1 As far as the synthetic part of consociation is concerned, the German items display a tendency to be fully expandable, regardless of their etymological origin. The figures for the Germanic and the Romance items are quite close, but the Romance words have the edge in terms of both full and partial expandability. These minor differences combine in the complementary category of unexpandable items, to yield a small yet clear difference of 3.88%. It is therefore possible to conclude that the German items with a Romance origin are more easily expandable than their Germanic counterparts, even if this difference must not be stressed too much. With 83% fully and 14% partially expandable items, the figures for the Germanic-and-Romance items take up tendencies of Germanic and Romance items respectively by comprising even less fully expandable items than the Germanic words and by comprising even more partially expandable items than the Romance words. This is also reflected in their unexpandability rate of 3%, which lies between the figures for the two pure etymological groups. The excluded items are expandable by 100%, the vast majority even fully so. 4.2.7 Consociation The figures 402 in Table 96 give clear evidence to suggest that the German language is consociated: of the 2,500 words that were analysed in the present study, only 21 are neither motivatable nor expandable. 403 This translates as 0.84%. Of the 2,479 consociated words, only 67, i.e. 2.68% of all 402 The words Faktor (B/ MP), militärisch (B+/ MO), oh (B+/ MP) and jene (B+/ MO) have been correctly coded here as belonging to the respective B-categories instead of the erroneous E-code, which was retained elsewhere. However, this has no substantial effect on the results. 403 This corresponds to the summed figures in the row with the code N. Results 212 items, are integrated into word families merely through motivatability. 404 More importantly from a quantitative point of view, 816 of the 2,500 list items, i.e. 32.64%, can be integrated into word families only by means of expansion. 405 However, as many as 1,596 items - and that constitutes the largest part with a proportion of 63.84% of the analysis words - are both motivatable and expandable. 406 Of these, the majority are subject to partial motivatability and full expandability. Table 96: Degrees of consociation of the DWDS items M U MO MP UT UN Full consociation B 377 1058 / / Partial consociation B+ 50 111 / / M 15 52 / / E / / 114 660 E+ / / 6 36 No consociation N / / 13 8 In the light of the general tendency set out above it is not surprising that the merely motivatable words display partial rather than full motivatability and that the unmotivatable items tend to be fully rather than partially expandable. More importantly, Table 96 illustrates that only 377 of the German items can be considered completely consociated if the strictest standards are applied. 407 2,102 more words are partially consociated, 408 and only 21 items cannot be related to a word family at all. Put differently, this means that only 15.08% of the words analysed are so strongly consociated that there can be hardly any disagreement on their status, but that it is possible to see some fairly satisfactory kind of family relationship in an additional 84.08%. 4.2.7.1 Consociation and frequency Figure 13 provides evidence that there is a direct relation between a word’s frequency/ rank and its degree of consociation. With the completely dissociated items playing only a minor role, the most important changes take place within the category of consociation. Thus, while consociation by motivatability only is practically negligible, consociation by expandability only and consociation in both the analytic and the synthetic direction behave like mirror-images of each other in Figure 13, and their respective graphs cross just above rank 800. With a few exceptions in the highest- 404 This corresponds to the summed figures in the row with the code M. 405 This corresponds to the summed figures in the rows E and E+. 406 This corresponds to the summed figures in the rows B and B+. 407 This corresponds to the darkest cell in Table 96. 408 This corresponds to the medium dark cells in Table 96. DWDS Core Corpus 213 frequency ranges, the items that are only expandable show a steady decrease, which is counterbalanced by the growth in words consociated both analytically and synthetically. Figure 13: Consociation and rank of the DWDS items 4.2.7.2 Consociation and word length As expected, the shortest words are mainly integrated into word families on the basis of their high degree of expandability only - a consociation type which decreases with growing word-length. By contrast, the proportion of words which are both motivatable and expandable shows a steady increase up to the nine-letter words and then fluctuates to reach 100% in the longest words. This is rather surprising, as the longest items might have been expected to be consociated by motivatability only. Instead, the proportion of the items that are consociated only by motivatability rises with intervening jumps and then drops to 0% in the longest words. However, this extreme value can be explained by the distortingly low number of representatives in the highest word-length ranges. Another conclusion that can be drawn from the figures in Table 97 is that no word of 11 letters or more is dissociated. As nine-letter words are also completely consociated, and as the behaviour of the ten-letter words may be considered to lie within statistical variability, one may even want to lower the threshold to nine-letter words. 0 500 1000 1500 2000 2500 0.0 0.2 0.4 0.6 0.8 1.0 Rank B E M N Results 214 Table 97: Consociation and word length of the DWDS items: in percentages Number of characters B M E N Tokens 1 - - 100.00 - 1 2 11.11 - 85.19 3.70 27 3 22.22 - 76.67 1.11 90 4 22.27 1.21 75.71 0.81 247 5 40.37 1.24 56.83 1.55 322 6 53.22 2.72 43.81 0.25 404 7 71.39 2.06 25.96 0.59 339 8 81.02 2.71 14.46 1.81 332 9 87.06 4.90 8.04 - 286 10 85.56 5.56 7.78 1.11 180 11 92.00 3.20 4.80 - 125 12 94.12 4.41 1.47 - 68 13 100.00 - - - 38 14 90.48 9.52 - - 21 15 88.89 11.11 9 16 100.00 - - - 5 17 100.00 - - - 1 18 100.00 - - - 4 21 100.00 - - - 1 If the German results are compared with those for English, it will be noticed that both languages show a very similar pattern with regard to the relation between degree of consociation and word length. Surprisingly, both languages fail to display a predominance of mere motivatability in the longest words present in the sample and rather show a majority of items integrated in both directions. This can be taken as an even stronger argument that the turning point may be expected in the lower frequency ranges, where words are supposedly longer. 4.2.7.3 Consociation and etymological origin Table 98: Consociation and etymological origin of the DWDS items: in percentages B M E N Tokens Germanic 64.99 3.17 30.86 0.99 2022 Romance 48.35 - 51.37 0.27 364 Germ.-and-Rom. 97.00 3.00 - - 100 Excluded 30.77 - 69.23 - 13 Unknown 100.00 - - - 1 At 64.99%, the Germanic German words have a very strong tendency to be both motivatable and expandable, whereas the Romance items are more or DWDS Core Corpus 215 less evenly distributed between consociation in both directions and consociation by means of expandability only. Contrary to expectation, the Germanic items are slightly more frequent in the category of dissociated words than the Romance items, but only with a ratio of 0.99% versus 0.27%. The most extreme results of the etymological categories with sufficient representatives to be of interest are reached by the Germanic-and-Romance words, with 97% of the items being both motivatable and expandable. They are also clearly above average in the category of the merely motivatable words, 409 and even more obviously below average in the category of the merely expandable items. About one-third of the excluded words with a more exotic provenience are both motivatable and expandable, and the remainder are integrated into word families by means of expansion only. 4.2.8 Dissociation Of the 2,500 words from the DWDS Core Corpus, only the 21 listed in Table 99 can be said to be dissociated in the sense that they are neither motivatable nor expandable in the sources used for the analysis. Yet even those items, which do not belong to any synchronic morpho-semantic word family, are not as antisocial as Leisi’s definition of dissociation may suggest, as 13 of them can be related to existing words via their transparent pseudoconstituents. As in the case of the English words, there is also one instance where the classification as dissociated may be too strict, namely in the case of the word obgleich ‘albeit’, which could actually be related to gleich in the meaning ‘no matter’. Furthermore, it is possible to find an expansion for 15 - i.e. 71.43% - of the dissociated items in the larger IDS Corpora. If these expansions were counted, one would be left with only 6 truly dissociated items, just as in English. 410 However, as the IDS Corpora are more than 15 times as large as the BNC, the IDS expansions cannot be considered here because this would prevent the comparability of the results. 409 Cf. Section 4.1.6.4 for an explanation. 410 Those items are allerdings, damit, durchaus, indem, obgleich - which can be considered fairly motivatable - and zumal. Results 216 Table 99: The dissociated German items L EMMA M OT . BY M. P O S E TY . R ANK F REQ . L. denn - UN conj g 86 104102 4 sondern - UN conj g 94 92652 7 weil - UN conj g 112 73412 4 damit [da, [mit UT conj g 117 70692 5 vielleicht [viel, [leicht UT adv g 172 43064 10 heißen - UN v g 211 36364 7 sogar [so, [gar UT? adv g- 282 28096 5 schaffen - UN v g 294 26739 8 allerdings [aller, [Ding, [-s UT adv g- 346 22065 10 indem [in, [dem UT conj g 403 19636 5 freilich [frei, [-lich UT? adv g 548 14709 8 durchaus [durch, [aus UT adv g 585 13895 8 dennoch [denn, [noch UT adv g 869 9362 7 ziemlich [ziemen, [-lich UT adv g 877 9301 8 immerhin [immer, [hin UT adv g- 1136 7235 8 is - UN v g 1481 5422 2 Sir - UN n rGR 1637 4918 3 desto - UN conj g 1731 4602 5 zumal [zu, [Mal UT conj g 1796 4404 5 obgleich [ob, [gleich UT conj g- 1815 4337 8 lauter [laut, [-er UT? det g 1877 4154 6 Table 100 lists the IDS expansions with their respective frequencies, three of which are remarkably high, namely in the case of Lauter-1er-Schüler, which occurs 26 times in the corpus, schaffbar with 44 tokens, and Who-is- Who with as many as 52 instances. By contrast, nine of the 15 expansions occur only once. With four exceptions, all of the IDS words contain more than two hyphens, which qualifies these expansions as phrase compounds. DWDS Core Corpus 217 Table 100: IDS expansions of the dissociated German items L EMMA P O S E XPANSION T OKENS denn conj Eigentlich-ist-alles-egal-denn-ich-bingerne-underdog-Charme 1 sondern conj Nicht-nur-sondern-auch 1 weil conj Nein-Weil-Position 3 damit conj - vielleicht adv Vielleicht-bald-Millionärin 3 heißen v Was-weiß-ich-wie-er-heißt 1 sogar adv Sogar-mehr als-Halbe-Halbe-Mann 1 schaffen v schaffbar 44 allerdings adv - indem conj - freilich adv selbstverfreilich 1 durchaus adv - dennoch adv Dennoch-Optimismus 1 ziemlich adv ziemlich-weit-rechtsaußen 1 immerhin adv “Es-war-ja-immerhin-lieb-gemeint- Geschenke” 1 is v Who-is-Who 52 Sir n Sir-Titel 4 desto conj Je-lauter-desto-besser-Moped 1 zumal conj - obgleich conj - lauter det Lauter-1er-Schüler 26 4.2.8.1 Dissociation and frequency shows that dissociation is particulary common among the most frequent items in the sample under consideration. Thus, 14 instances of dissociation can be found among the 1,000 most frequent items, and another seven in the ranks between 1,001 and 2,000. By contrast, there is not a single dissociated item between ranks 2,001 and 2,500 - which becomes particulary noteworthy if one considers that there is no dissociation for 623 ranks, while the largest gap up to that point comprises only 345 places. However, with as few dissociated words as there are in the sample, no truly statistically relevant conclusion - such as the conclusion that the less frequent words are more unlikely to be dissociated - can be drawn from this. Results 218 4.2.8.2 Dissociation and part of speech In terms of part of speech, one may expect the dissociated words to belong predominantly to the category of functional, closed-class words. Indeed, there are eight conjuctions, eight adverbs 411 and one determiner among the dissociated items, as against three verbs and one noun. 412 Among the dissociated words that are not even expandable in the IDS Corpora, the ratio is of four conjunctions versus two adverbs, which confirms the tendency outlined above. 4.2.8.3 Dissociation and word length The 21 dissociated words are between two and ten characters in length. The average length is 6.33 characters - a figure that is very clearly below the general average of 7.22 characters. 4.2.8.4 Dissociation and etymological origin As far as the language of origin of the dissociated items is concerned, all but one of the words are of Germanic origin. 413 The only Romance item, Sir, is a loan word from English, so that a proximate-language account may even state that all of the dissociated German items are of Germanic origin. Of course, the analysis sample contains more Germanic than Romance items, and the number of dissociated items as such is relatively low. However, a ratio of 95.24% or 100% Germanic dissociated items in contrast to 80.88% in the whole sample is nevertheless large enough to conclude that a Romance origin is not a factor that makes it more probable for a German word to be dissociated. 4.3 English vs. German Important and relevant as the description of the individual languages for their own sake may be, deeper insights into the respective consociation and dissociation of English and German can only be gained by contrasting the results for the two languages with each other. For the sake of concision, though, only a selection of the most important combined variables will be contrasted. 411 In German, adverbs can be considered a grammatical category (cf. Bußmann 2002 s.v. Adverb). 412 Part of speech was not encoded for the German language in general, but only for the dissociated items. This was done on the basis of the detailed part-of-speech-tagged DWDS frequency list mentioned in Section 3.1.2.2. Lauter, however, is labelled determiner on the basis of its functional similarity to the English class of determiners as used here in spite of its original classification as an attribuierendes Indefinitpronomen ohne Determiner. 413 Four of these twenty etymologies are self-elaborated. English vs. German 219 4.3.1 Word length The contrastive summary in Table 101 illustrates that the English words are shorter than their German counterparts: the longest German word exceeds the longest English item by as many as seven letters, and the largest English category are the four-character words, as against a maximum of sixcharacter words in German. Even more importantly, about 67% of the English and German words are covered by the combination of four to seven and five to nine letters respectively. These tendencies are further supported by an average word length of 6.18 characters in English and 7.22 in German. Table 101: Contrastive length of the analysis items in number of characters: in percentages Number of characters BNC DWDS 1 0.08 0.04 2 1.28 1.08 3 6.16 3.60 4 19.48 9.88 5 17.20 12.88 6 16.84 16.16 7 13.48 13.56 8 9.08 13.28 9 7.00 11.44 10 4.44 7.20 11 2.84 5.00 12 1.24 2.72 13 0.56 1.52 14 0.32 0.84 15 - 0.36 16 - 0.20 17 - 0.04 18 - 0.16 21 - 0.04 4.3.2 Frequency The German frequency values are more extreme than the English ones, with the most frequent item being almost twice as frequent as its counterpart, and the least frequent one featuring slightly below the figures for English. The overall frequency of the German items is far higher than in English, with an average of 29,748 as against 7,431. Results 220 4.3.3 Morphology 4.3.3.1 Compounds At 133 vs. 50, the number of compounds on the German list is more than twice as large as in English. As compounds are motivatable by definition, this phenomenon might lead to an unfortunate asymmetry between English and German. Indeed, the 133 compounds in the German sample turn out to be motivatable without exception, 40 of them even completely. Consequently, all compounds are either consociated by motivatability only or in both directions, the latter being the case for 114 of the 133 German compound words. An analogous phenomenon can be observed in the 50 English compounds, 18 of which are fully and 32 of which are partially motivatable. Similarly, 7 of the 50 English compounds are consociated through motivatability only, and the other 43 are both motivatable and expandable. 414 4.3.3.2 Affixes The German items contain more affixes than the English list words. Thus, there are not only 62 German prefix and 87 suffix types as against 32 English prefix and 77 suffix types, but also more than three times as many German prefix tokens. As for the suffix tokens, German beats English by 798 vs. 717. Furthermore, the most frequent German affixes are far more frequent than their English counterparts, e.g. ver- (50x) vs. re- (15x), or -ung (143x) vs. -ion (61x). Only the most frequent English grammatical suffix, -ly (82x), is more frequent than its German counterpart -e (35x). In contrast to English, the German language sample contains a grammatical prefix type as well as four kinds of linking morpheme. While the token-type-ratio (TTR) of English and German lexical suffixes is highly similar at 8.71 as against 9.10, with German lexical suffixes occurring on average in slightly more words, the German prefixes (TTR = 5.44) are definitely more productive than their English counterparts (TTR = 3.06). Only 16 of the 109 English affix types are not recorded in the learner’s dictionaries, which corresponds to a ratio of 14.68% of the types and merely 7.48% of the tokens. In German, by contrast, even if the Universalwörterbuch is also admitted as a source for the affixes, 32.03% of the affix types and 39.83% of the tokens are self-elaborated. These figures rise to as much as 42.48% of the types and 43.39% of the tokens if only LGWDaF affixes are admitted. 415 414 See Section 4.3.5 and Section 4.3.6 for the calculation of the motivatability and expandability of the English and German words without the compounds. 415 The large number of necessary self-elaborated affixes in German, most of which are described in other reference works, justifies the procedure adopted here in retrospect. English vs. German 221 With English and German lexicographical traditions as different as those pertaining to the inclusion of affixes in dictionaries, the listing of affixes in particular dictionaries cannot represent a good and objective criterion in contrastive analyses. For instance, while the English learner’s dictionaries also contain unproductive affixes such as -th, the LGWDaF tends to include only productive derivational affixes. 416 4.3.4 Etymology Table 102 illustrates very clearly how much the two languages differ with respect to their etymology: while the 2,500 most frequent German lexemes are predominantly of Germanic origin with only 15% Romance exceptions, the English items are more evenly split between the two categories, but with a clear preponderance of the Romance element. 417 Table 102: Contrastive etymological origin: in percentages C ATEGORY BNC DWDS Germanic 38.32 80.88 Romance 54.52 14.56 Germanic-and-Romance 5.68 4.00 Excluded 0.72 0.52 Eponymic 0.04 - Unknown 0.72 0.04 Surprisingly, the difference in the number of items with a Greek origin is relatively small with 98 English as against 96 German items - which corresponds to a considerably larger part of the Romance vocabulary in German than in English. Table 103: Contrastive etymological origin of the affixes: in percentages C ATEGORY BNC DWDS Types Tokens Types Tokens Germanic 32 278 109 1000 Romance 74 531 43 147 Germanic-and-Romance 1 1 1 3 Homonymous 2 5 - - Both the Germanic types and tokens are more frequent in German than in English, whereas the Romance types and tokens are more frequent in English than in German. The majority of the English affixations are motivatable 416 Information provided by Dieter Götz, one of the editors of the LGWDaF. 417 There is one English etymological category that does not exist in the German sample, namely that of eponymic words, but due to the low number of representatives of this category, it is irrelevant in the contrastive comparison of most aspects. Results 222 by Romance affixes, while the majority of German affixations contain Germanic affixes. Table 104: Contrastive token-type-ratio and etymological origin 418 P REFIXES S UFFIXES L INK . M ORPH . L EX . G RAMM . L EX . G RAMM . E D E D E D E D E D Germ. 3.33 6.30 - 8.00 7.95 13.20 15.29 9.57 - 3.75 Rom. 3.08 1.27 - - 9.27 4.16 - - - - Where categories exist in both English and German, grey boxes in Table 104 indicate in which language the token-type-ratio is higher. Romance lexical prefixes and suffixes are more productive in English, 419 while Germanic lexical prefixes and suffixes have a higher token-type-ratio in German. In both languages, the grammatical affixes are all of Germanic origin. English grammatical suffixes occur on average in more words than German grammatical suffixes. This can be explained by the fact that there are more inflectional morphemes in German than in English. The Germanic and the Romance lexical prefixes and suffixes in English have relatively close token-type-ratios, and consequently it can be concluded that the etymological origin is not a decisive factor in the productivity of English affixes. By contrast, etymology does seem to play an important role in German. Thus, the token-type-ratio of Germanic lexical prefixes and suffixes is about three to five times as high as that of Romance lexical affixes, implying that Germanic affixes are far more productive than Romance affixes in the German language. The comparison of the association of etymological origin and rank in both languages reveals that there is a direct relation: both in English and in German, the highest-frequency items are predominantly Germanic in origin. While the proportion of Germanic items decreases with rank, there is an inverse growth of Romance words. These tendencies are far more pronounced in English than in German. 4.3.5 Motivatability The comparison between the motivational values of the English and German analysis items produces some of the most central results of the present study. 418 The abbreviation E stands for English and D for German. 419 In this context, productivity is understood in the sense that an affix is part of established words, and all conclusions are drawn within a binary analysis approach. English vs. German 223 Table 105: Contrastive motivatability: in percentages C ATEGORY BNC DWDS Full motivatability 13.48 17.68 Partial motivatability 46.12 48.84 No motivatability, but transparency 4.40 5.32 No motivatability 36.00 28.16 Most of the figures in the columns for the English and German analysis items in Table 105 are relatively similar. For instance, the difference in terms of transparency is only 0.92%. Partial motivatability is 2.72% higher in the German words, and 4.20% more DWDS items are fully motivatable. While these individual differences may seem relatively small, the tendency for the German words to be slightly more motivatable than the BNC items is reinforced when the opposite direction, namely the lack of motivatability, is considered. Here, the interlinguistic gap amounts to 7.84%. Table 106: Contrastive motivatability: simplified table: in percentages C ATEGORY BNC DWDS Motivatability 59.60 66.52 No motivatability 40.40 33.48 If the motivational results are expressed in terms of an either-or analysis, both languages contain a majority of motivatable items among the 2,500 most frequent lexemes, but the German words take the lead by a margin of 6.92%. Motivatability in English and in German is significantly different (p < 0.00001 according to a Chi square test). It is highly probable that the large difference in unmotivatability is the main explanation for this result. It has been mentioned before that for reasons of better comparability, it makes sense to calculate the motivational figures for English and German if the compounds in the two languages are not considered. The results are summarised in Table 107. Table 107: Contrastive motivatability without compounds: in percentages C ATEGORY BNC DWDS Full motivatability 13.02 16.98 Partial motivatability 45.76 47.66 No motivatability, but transparency 4.49 5.62 No motivatability 36.73 29.74 Total number of items 2450 2367 The comparison with Table 35 and Table 82 reveals that the effect on the general results is only minor. The proportion of motivatable items is Results 224 slightly decreased in both languages - to 58.78% in English and to 64.64% in German, which translates as a difference of 5.86%. Overall, the omission of the compounds yields the expected result, namely that the figures for the two languages become closer to each other by 1.06%. However, this difference is only minimal. If one compares the factors that produce some kind of difficulty in the analysis of motivatable items, one can conclude that four variables show very similar proportions in English and in German, namely motivatability by clippings only, the markedness of constituents, incomplete analysability and formal differences - the latter two of which are more important numerically. Table 108: Contrastive partial motivatability: in percentages C ATEGORY C ODE BNC DWDS Formal differences total 15.92 16.60 F 72.86 56.39 FS 9.55 2.41 FW 11.56 0.48 FV 6.03 19.76 FVU - 20.96 Semantic differences total # 13.64 23.56 Incomplete analysability total 14.12 12.96 R 15.94 13.78 RT 40.58 52.13 A 43.48 34.09 Marked constituents total 2.12 4.12 L 81.13 69.90 LI 18.87 30.10 Motivatability by a clipping or a formally related shorter synonym total C 0.52 0.84 Motivatability by a grammatical polyseme total Z 15.80 5.08 Self-elaborated affixes total 1.68 8.80 g 19.05 32.73 l 80.95 67.27 This similarity in the degree of formal difficulties is highly surprising, as one may have expected English with its notorious discrepancies between spelling and pronunciation to yield considerably higher results in this domain than German. The explanation for this phenomenon can be found in the proportion of the subcategories. Thus, the problems in the analysis of English words can be mainly traced back to combined spelling and pronunciation differences. This category is also the largest one in the German form-based motivational category, but changes in vowel quality - whether English vs. German 225 vowel gradation or the exclusively German vowel mutation - occupy a very prominent position as well. By contrast, formal differences related to either spelling or pronunciation are the second and third most important obscuring factors in English, but they play only a very minor role in German. Within the category of incomplete analysability, the two languages also show different tendencies: thus, motivatability by affixes only relates to about 130 German items, but as many as 180 English words. Categoryinternally, this represents a difference of almost 10%, which may be attributed to the large number of English words of the oral type. The German incompletely analysable words, however, rather contain transparent rests in almost 12% more cases than their English counterparts. This may be due to the large number of semantically obscured prefixations. The relatively large number of German words whose constituents were not marked by the dictionaries but by the analyst indicates that the marking practice in the English and German dictionaries under consideration differs. Furthermore, only 1.68% of the English words require the intervention of a self-elaborated prefix or suffix. By contrast, 8.80% of the German items contain an affix that is not recorded in the learner’s dictionary LGWDaF or even in the larger monolingual Universalwörterbuch. 420 It was to be expected that motivatability by a grammatical polyseme would play a more important role in the English than in the German words. The fact that figures are more than three times as high in English than in German may be taken as indicative of the different structure of the vocabulary of the two languages. The most surprising difference lies in the domain of semantics, where one might have predicted a very similar ratio in both languages. That this particularly important factor that weakens consociative links is stronger in German than in English contravenes the original expectations. Two explanations can be put forward to account for this data. On the one hand, it is possible that the analyst with German as a native language was more reluctant to accept full motivatability from a semantic point of view in German high-frequency words that are likely to be treated as units from a psycholinguistic perspective. However, this effect may be relatively strong after almost twenty years of acquaintance with English words as well. More likely, then, the reason concerns the structure of the German language: thus, there are 337 prefix tokens in the German list words, and 589 analysis 420 However, the tables in Section 4.2.3.2 show that online dictionaries such as grammis and canoo record most of those affixes required for a satisfactory analysis of the German vocabulary. Rather than pointing towards an inherent problem in the German affix system, these results therefore indicate that there is still scope for development in German lexicography. The excellent macroand microstructure of the English learner’s dictionaries, which contains vast amounts of information on affixes, could be taken as a model in this respect. Results 226 items contain a semantic obstacle. However, if these two variables are combined, there are as many as 190 words containing both prefixes and semantic obstacles - which means that about every second prefixed word contains some kind of semantic obstacle, and that roughly every third semantically difficult word in the list contains a prefix, even though prefixations only constitute about one-seventh of the German analysis items. 4.3.5.1 Motivatability and frequency Motivatability grows with decreasing frequency, both in English and in German. However, German is the first of the two languages to reach a majority of motivatability with decreasing rank. 421 4.3.5.2 Motivatability and etymological origin Table 109: Contrastive etymological origin of motivatable words: in percentages C ATEGORY BNC DWDS Germanic 49.47 68.25 Romance 63.32 48.90 Germanic-and-Romance 96.48 100.00 Excluded 55.56 30.76 Unknown 33.34 100.00 Eponymic - - Table 109 illustrates what proportion of English and German words belonging to the various etymological categories are motivatable. Surprisingly, all categories but one display the same pattern, namely a majority of motivatability either in the English or in the German words, but not in both. Germanic-and-Romance words, which are predominantly motivatable in both languages to a very similar degree, constitute the only exception. The most interesting pattern emerges from the comparison of the Germanic and the Romance items, which behave approximately like mirror images of each other: while the English words with a Germanic origin are motivatable to only 49.47%, i.e. very slightly below 50%, the German items with a Germanic origin are overwhelmingly motivatable. By contrast, a motivatability rate of 48.90% for the Romance words in the German language contrasts with 63.32% motivatable Romance English words. If the finer, category-internal distinctions are taken into account as well, the tendency for the Romance words to be more motivatable in English and for the Germanic words to be more motivatable in the German language is reinforced. Thus, 9.98% Romance words are fully motivatable in English, but only 4.67% in German, and 53.34% Romance words are partially moti- 421 Cf. Figure 7 and Figure 12. English vs. German 227 vatable in English as against only 44.23% in German. By comparison, 18.60% Germanic words are fully motivatable in German, but only 9.60% in English, and another 49.65% Germanic words show partial motivatability in German, as against 39.87% in English. Table 110: Contrastive motivatability and etymological origin: in percentages C ATEGORY MO MP BNC DWDS BNC DWDS Germanic 9.60 18.60 39.87 49.65 Romance 9.98 4.67 53.34 44.23 Germanic-and-Romance 76.06 47.00 20.42 53.00 Excluded - 15.38 55.56 15.38 Unknown 5.56 - 27.78 100.00 Eponymic - - - - 4.3.6 Expandability Table 111: Contrastive expandability: in percentages C ATEGORY BNC DWDS Full expandability 87.96 88.36 Partial expandability 10.92 8.12 No expandability 1.12 3.52 As far as the expandability of their vocabulary items is concerned, the English and the German languages show an even more striking similarity than in the domain of motivatability. Thus, both languages are predominantly fully expandable, with German leading by only 0.40%. This is compensated by the English predominance in the category of partial expandability, with a difference of 2.80%, so that the English analysis items are ultimately more expandable by a margin of 2.40%. This difference is significant (p < 0.00001 according to a Chi square test) and can be attributed to the relative proportions of partially expandable and non-expandable items. It is interesting to note that the factors bringing about partial expandability are similarly distributed across the two languages from a quantitative point of view, with formal obstacles, which play a more important role in English than in German, representing the only exception. However, the difference of 2.60% is still relatively small. Within the form-related category, it is no surprise that only-written obstacles are more prominent in the English words - a phenomenon that could already be observed in the discussion of motivatability. Vowel changes, by contrast, are an exclusively German phenomenon in expandability. As far as marking is concerned, the Results 228 overwhelming majority of the affected English words are labelled by the dictionaries, whereas marking by the analyst plays a relatively important role in the German sample. Table 112: Contrastive partial expandability: in percentages C ATEGORY C ODE BNC DWDS Formal differences total 4.44 1.84 F 25.23 32.61 FS 24.32 19.57 FW 50.45 - FV - 30.43 FVU - 17.39 Semantic differences total # 0.20 0.12 Marked expansions total 6.48 5.64 l 7.41 14.89 L 85.80 39.72 lI 0.62 5.67 LI 6.17 39.72 Expandability by a formally related longer synonym total C 0.16 0.76 Expandability by a grammatical polyseme total Z 0.20 0.20 It has been mentioned before that for reasons of better comparability, it makes sense to calculate the expandability figures for English and German if the compounds in the two languages are not considered. The results are summarised in Table 113. Table 113: Contrastive expandability without compounds: in percentages C ATEGORY BNC DWDS Full expandability 88.20 89.31 Partial expandability 10.94 7.77 No expandability 0.86 2.92 The comparison with Table 111 reveals that the effect is only minor. Overall, the omission of the compounds yields the expected result, namely that the full expandability rate is slightly increased in both languages - understandably more so in German, from which more long and hard to expand compounds are omitted than from English. Partial expandability figures for English rise almost imperceptibly, and there is even a small unexpected decrease in German. All in all, the difference between the two languages is reduced to 2.06% - with 0.34% less than for the list words in general, a difference that is only minimal. English vs. German 229 4.3.6.1 Expandability and source size Table 114: Contrastive expandability and source size: in percentages English German OALD/ LGWDaF 82.84 82.20 SOED/ UW 9.96 6.52 OED/ GWDS 2.32 1.84 BNC/ DWDS 3.76 5.88 No expansion found 1.12 3.56 If the proportional use of the English and German sources is contrasted, as in Table 114, it turns out that the number of items for which an expansion can be found in the learner’s dictionaries is almost identical. 422 While the large monolingual dictionaries are also relatively close with a proportion of 2.32% and 1.84%, the medium-sized dictionaries vary by 3.44%, with the SOED yielding better results than the UW. Even though 2.12% more German expansions come from the equalising 100-million-word corpus, 423 more items remain expansionless in the German than in the English list. 424 4.3.6.2 Expandability and etymological origin Table 115: Contrastive expandability and etymological origin: in percentages F ULL EXP . P ARTIAL EXP . N O EXP . BNC DWDS BNC DWDS BNC DWDS Germ. 89.77 88.13 8.25 7.72 1.98 4.15 Rom. 86.21 90.93 13.65 8.79 0.15 0.27 Table 115 indicates that partial expandability and lack of expandability are more language-specific than language-of-origin-specific. Thus, more English than German items are partially expandable, irrespective of their Romance or Germanic origin. Conversely, more German than English items are not expandable, regardless of their origin as well. The only exception to this general rule lies in the category of full expandability. Here, the Germanic words have a tendency to be more easily expandable in the English 422 This result is relatively interesting from a quantitative point of view, as the number of potential expansions from the learner’s dictionary is much larger in German - the LGWDaF comprising about 71,500 lemmas in the sense defined in Section 3.4.1.1 - than in English with about 37,500 lemmas in the OALD. 423 58 English and 59 German expansions from the corpora are hapaxes - two figures that are close enough to justify either the inclusion or the exclusion of such data in the analyses. 424 These 3.56% non-expandable German words could be reduced to 1.60% if the IDS Corpora were permitted as an additional corpus, but their proportion would still remain higher than in English. Results 230 language, and the Romance items are more likely to be expandable in the German language. This result is quite surprising, as one may have expected the predominantly German language to provide a better expansion-ground for the Germanic items, and the mixed-to-predominantly Romance English language to increase the likelihood for Romance expansions. However, the differences between all of these results are actually so small that it is not possible to speak of a strong tendency here. 4.3.7 Consociation Table 116: Contrastive consociation: in percentages English German Full consociation B 58.76 63.84 Partial consociation M 0.88 2.68 E 40.12 32.64 No consociation N 0.24 0.84 The comparison of the consociation figures for English and German yields some of the most important results of the present study. Thus, consociation in both the analytical and the synthetic direction is definitely stronger in German than in English with about 64% vs. 59%. In addition, far more items show the fullest possible degree of consociation in German - 377 as against 299 in English. While consociation by means of motivatability only is also more prevalent in German, the integration of items into word families by means of expandability only is so dominant in English - with 40.12% as against 32.64% in German - that English dominates partial consociation by 5.68%. Most importantly, Table 116 also shows that contrary to prior expectations in the literature, it is not the English but rather the German language that shows a lower degree of consociation. However, a ratio of 0.84% as against 0.24% corresponds to a relatively small difference - even if the general differences between the two languages are highly significant (p < 0.00001 according to a Chi square test). 425 Table 117: Contrastive consociative strength English German Tokens Points Tokens Points B 3 1468 4404 1596 4788 M 2 22 44 67 134 E 1 1004 1004 816 816 N 0 6 - 21 - 5425 5738 425 The same result is obtained if the uncorrected data is used (cf. Section 4.1.8 and Section 4.2.7). English vs. German 231 The number of dissociated items being higher in German than in English - 21 in contrast to only six -, the hypothesis that English is a dissociated language in contrast to German has to be refuted. In order to do justice to the varying degrees of consociative strength of the different categories, though, it is worth calculating the consociative strength of the two languages by assigning a certain number of points for each item in a particular category. Thus, the isolated N-items are not counted at all because they do not contribute to consociation. Words that are only expandable receive one point, only motivatable items two points, and items consociated in both directions three points. With 5,738 points, the consociative strength of the German words is 313 points, i.e. 5.77%, higher than that of English. This can be interpreted to mean that even if there are more dissociated German words, the general strength of consociation in German is higher than in English, even if a difference of only 5.77% cannot be taken as a justification for calling the English vocabulary “antisocial” in contrast to the German vocabulary. Table 118: Contrastive consociation and etymological origin: in percentages B M E N BNC DWDS BNC DWDS BNC DWDS BNC DWDS Germ. 48.02 64.99 1.46 3.17 50.00 30.86 0.52 0.99 Rom. 63.32 48.35 0.07 - 36.54 51.37 0.07 0.27 G.-a.-R. 91.55 97.00 4.93 3.00 3.52 - - - Excl. 55.56 30.77 - - 44.44 69.23 - - Unkn. 33.33 100.00 - - 66.67 - - - Epon. - - - - 100.00 - - - The chess-board-like distribution of the dark cells in the upper left-hand corner of Table 118 indicates that the Germanic and the Romance items behave differently in English and German with respect to their consociation. Thus, a majority of German Germanic items and of English Romance items show the highest degree of consociation, namely that of the B-type involving both analytical and synthetic word family integration. Also, more German Germanic and more English Romance words are consociated merely by motivatability. These results are in line with the fact that more German Romance and more English Germanic items are consociated by expandability only - which constitutes the lowest degree of word-family integration. Nonetheless, the above data also permits the surprising conclusion that it is firstly not the Romance, but rather the Germanic words that are more dissociated, and that secondly German rather than English is the language with the higher dissociation rate. This is exemplified by the increasing degree of dissociation, starting with Romance items in English, where the dissociation rate is 0.07%, then Romance items in German, which are dissociated to 0.27%, followed by 0.52% dissociated Germanic items in Results 232 English, and last but not least 0.99% dissociated Germanic items in German. It is also possible to deduce from this order that the Germanic origin has a more dissociating tendency than the language under consideration. However, the differences between the figures are only minimal and should not be stressed unduly. Table 119: Constrastive etymological origin and consociative strength: in tokens B M E N BNC DWDS BNC DWDS BNC DWDS BNC DWDS Germ. 460 1314 14 64 479 624 5 20 Rom. 863 176 1 - 498 187 1 1 If one calculates the consociative strength of the Germanic and the Romance English and German items according to the criteria set out above, the large number of Germanic B-consociated items in the German language results in a Germanic German consociative strength of 4,694 points, followed by Romance English words with 3,089 points, Germanic English words with 1,887 points, and finally 715 points for Romance German items. These weighted results may be a more reliable indicator of the actual situation. In any case, they show that the difference in the consociative strength of the English items of Romance and Germanic origin is less marked than in the German items of differing origin. If the figures for English and German are considered jointly, the consociative strength of Germanic vs. Romance words is 6,581 as against 3,804. Table 120: Constrastive etymological origin and consociative strength: in points B M E N BNC DWDS BNC DWDS BNC DWDS BNC DWDS Germ. 1380 3942 28 128 479 624 0 0 Rom. 2589 528 2 - 498 187 0 0 4.3.8 Dissociation There are 21 dissociated items in German, but only 6 in English. Consequently, all of the following results are based on too few items to be truly meaningful and can only be indicative of certain tendencies. In both languages, it is possible to relate a majority of these words to other vocabulary items in some way, even though not well enough for them to deserve the label consociated. In both languages, the less frequent items are less dissociated. Thus, behalf - rank 2,191 - is the least frequent English item affected by dissociation, English vs. German 233 while the German results are even more extreme with lauter as high up as rank 1,877. As for the part of speech, the German dissociated words mainly belong to the category of closed-class items. Open lexical categories seem to dominate in English, but the words in question are functionally like grammatical words as well. The word length of the dissociated items lies below the general average for both English and German. Both languages also agree in showing a preference for a Germanic origin of the dissociated items, the tendency being more marked in German. Consequently, dissociation cannot be generally attributed to a Romance origin of the word in question. 5 Discussion The following sections summarise the most important results from Chapter 4 and discuss their relevance. The data is used to verify or disprove eight hypotheses based on prevailing views in the field of dissociation, and the results of the present analysis are compared with those of previous, related studies. Furthermore, this chapter offers alternative calculations of the results and also draws attention to the boundaries and limitations of the present study. 5.1 Testing the hypotheses 5.1.1 Hypothesis 1: Motivatability is higher in German than in English. If Leisi’s concept of dissociation is understood in a very narrow sense, as in Burgschmidt’s (1976: 40) paraphrase, its defining requirement is reduced to the absence of motivatability, and one would expect motivatability to be far higher in German than in English. Whatever the approach that one decides to follow, though, it is first necessary to fix a limit at which one can speak of a dissociated language and to determine what ratio of consociation does not deserve this designation any more. The same is true of the delimitation of motivatability. The very strictest possible solution, namely postulating that a single instance of motivatability in a sample of any size automatically prevents the language in question from being dissociated and vice versa, is not tenable. The 50%limit represents a plausible cut-off point because it is possible to speak of a majority - although the differences may be only minimal. A two-thirds majority is even more meaningful, 426 and if one of the values exceeds 80%, it is definitely legitimate to speak of a consociated or a dissociated language respectively. The 2,500 German analysis words show a relatively high motivatability rate of 66.52%. This pertains to more than half of the analysis items - twothirds, to be more precise -, 427 but one may still argue that this figure is far from involving the vast majority of words. However, one must not forget the fact that the analysis items are the most frequent ones in the German language, and that motivatability increases with decreasing frequency, so that the proportion should be higher for the German vocabulary as a 426 After all, two-third majorities represent a sufficient proportion for decision-taking or for the overriding of vetoes in various political systems. 427 German falls short of the two-thirds standard by so few percentage points - actually even less than a whole percentage point - that the figures can be rounded up. Discussion 236 whole. Yet this phenomenon is also observable in English, so that the contrastive results may remain more or less uninfluenced by this general tendency. When the same line of thought as above is adopted for English, the hypothesis that the English vocabulary is predominantly unmotivatable can be refuted if the motivatability rate exceeds 50%. At 59.60%, clearly more than half of the analysis items from the BNC lemma list are motivatable in some way or another. While even generous rounding will not turn this figure into a two-thirds majority, the results for German and for English are nonetheless relatively close. A difference of only 6.92% cannot justify the categorisation of the German vocabulary as clearly motivatable and of the English vocabulary as clearly unmotivatable. A more detailed comparison of the motivational behaviour of English and German reveals only minor differences: partial motivatability is 2.72% higher in the German items, and full motivatability is also 4.20% higher in the words from the DWDS Core Corpus. Contrary to expectations, the omission of compounds from the word lists does not lead to important changes. Surprisingly, formal differences pose an obstacle in roughly the same number of English and German words, but while the English items are almost exclusively affected by differences in both spelling and pronunciation, the German words are more or less evenly split between this type of obstacle and changes in vowel quality. Motivatability by a grammatical polyseme is more than three times as high in English than in German. The same proportion, but in the opposite direction, applies to obstacles of a semantic kind. Summing up, there are differences between the languages as far as the motivational subcategories are concerned, which result in a roughly 7% higher German motivation rate. Thus, Hypothesis 1 is confirmed, but on a more general level, the picture that emerges is very similar for the 2,500 most frequent items in the two languages. 5.1.2 Hypothesis 2: Expandability is higher in German than in English. If consociation is interpreted as a bidirectional phenomenon, it makes sense to compare the two languages with respect to their expansive qualities before combining both directions of analysis. As the productivity of German word-formation is usually regarded as extremely strong, while Leisi (1961: 261 and 1999: 51-52) believes that English has lost its word-forming power, 428 one may conclude that expandability should be higher in German 428 However, this is not a statement that a majority of linguists would agree with. For instance, according to Scheler (1977: 108), English word formation has only become Testing the hypotheses 237 than in English. 429 However, the research results speak a different language. Altogether, 98.88% of the English list items can be expanded into a longer word, 87.96% of them even fully. At 88.36%, full expandability is only minimally higher in German, but due to a lower German partial expandability rate, overall expandability is even higher in English by a margin of 2.40%. Contrary to expectation, the omission of compounds has no influence on these general tendencies. Thus, Hypothesis 2 must be rejected for the samples under consideration: expandability is slightly yet significantly higher in English than in German. 5.1.3 Hypothesis 3: German is a considerably consociated language. Regardless of whether one defines consociation as a monoor bidirectional phenomenon, Hypothesis 3, i.e. the claim that German is a language whose vocabulary tends to be consociated, can be confirmed for the highfrequency vocabulary. Thus, practically two-thirds of all the German words considered in the study can be consociated by some kind of motivation. If expandability is also accepted as a consociating factor, the proportion of consociated items rises to 99.16% - a result that speaks for itself. More than 60% of the items are both motivatable and expandable, whereas only 21 German words are not integrated into a word family in any way and are therefore dissociated. 5.1.4 Hypothesis 4: English is a considerably dissociated language. The most important hypothesis in the light of the present study is claim number four, namely that English is a considerably dissociated language. If Leisi’s hypothesis is interpreted as consisting of an analytical component only, a consociative ratio of 59.60% can be considered enough to state that the English vocabulary does not tend towards dissociation but rather towards consociation - at least in the highest frequency ranges. This statement is even more valid if both directions of word-family integration are considered. The results of the present study indicate that the English vocabulary - at least its most frequent part - is by no means as isolated as is commonly claimed. Thus, only six out of 2,500 words are not integrated into a word family in any of the ways accepted in the present study. This less productive but has not been given up completely, and Gelfert (2003: 71) even believes English to be more productive than the other occidental languages. 429 This argumentation would not work if the majority of the English word stock stemmed from the Old English period, for which a high productivity is recognised. However, the figures in Section 4.1.5.2 prove that this is not the case - at least not for the totality of the items under consideration. Furthermore, the proportion of words already attested in Old English decreases with decreasing frequency. Discussion 238 translates as a ratio of 0.24% - a proportion that can hardly be considered enough to call English a dissociated language. As this figure is even lower than that for German - the difference there amounts to 0.84% -, it supports the interpretation that the dissociation of the English vocabulary has been unduly stressed in the past. 5.1.5 Hypothesis 5: There are more Germanic than Romance words among the high-frequency lemmas. Both English and German are classified as Germanic languages 430 and could therefore be expected to consist mainly of Germanic vocabulary items. Due to its history and the large number of various influences, English contains a larger number of Romance words than German, but as Finkenstaedt and Wolff (1973: 119) have shown that their proportion decreases with increasing frequency, the high-frequency items from the BNC list may be expected to contain a majority of Germanic words. While Hypothesis 5 can be confirmed for the high-frequency German words, with 80.88% of the list items being of Germanic origin, it has to be rejected for the English words, of which only 38.32% have a purely Germanic background. These figures show that English is a mixed language throughout, and that even the high-frequency words are predominantly Romance words. However, as their 54.52% ratio lies only slightly above the 50%-limit, the claim that English is actually a Romance rather than a Germanic language as far as its vocabulary is concerned can only be made with extreme caution for the frequency ranges under consideration. 5.1.6 Hypothesis 6: Motivatability is higher in Germanic than in Romance words. Leisi (1999: 51) believes that the large number of words with a Romance origin is responsible for the low motivatability - and ultimately for the dissociation - of the English vocabulary. The discussion of Hypothesis 1 shows that this hypothesis is based on the erroneous assumption that English is an unmotivated language. Nonetheless, one may ask whether there is really a connection between a Germanic origin and motivatability and between a Romance origin and the lack thereof. Interestingly, the English and German analysis items behave in direct opposition to each other. Thus, 68.25% of the Germanic DWDS items are motivatable in contrast to only 49.47% Germanic BNC words, whereas 63.32% motivatable Romance words in English stand against only 48.90% motivatable Romance words from the German word list. 431 A look at the finer, category-internal distinctions reinforces this tendency for the Ro- 430 Cf. Bußmann (2002 s.v. Germanisch). 431 Note, however, that the minority figures are extremely close to 50% in both cases. Testing the hypotheses 239 mance words to be more motivatable in English and for the Germanic words to be more motivatable in German: both full and partial motivatability of Romance items are higher in the BNC words, and both types of motivatability are higher in the Germanic items from the German sample. Thus, Hypothesis 6 can be neither confirmed nor rejected in its general form. Depending on the language under consideration, it is either true or false: true for German, but false for English. 432 The conclusion that can be drawn here is that the Romance words in German have retained a more alien character than in English. Considering all this evidence, a tentative explanation for Leisi’s hypothesis can be seen in his native German background: as Romance words are less motivatable in German than the Germanic words, he may have subconsciously applied the German situation to the English vocabulary. This hypothesis is in line with the fact that the scholars who complain about the dissociated English language are typically German-speaking linguists. As native speakers of English see their own language as the starting point, they have little reason to believe that something is wrong with it. 433 Maybe that is why the theory of dissociation has mainly been advocated by German-speaking linguists who compare English with their own native language. 434 5.1.7 Hypothesis 7: Consociation is higher in Germanic than in Romance words. Hypothesis 7, that consociation is higher in Germanic than in Romance words, can be directly derived from Hypothesis 6 - indeed, it is nothing but the bidirectional interpretation of Leisi’s statement -, but it is easier to reject it in its general form. Thus, more Romance high-frequency words are consociated than Germanic words in both English and German. However, with a proportion of 99.48% consociated Germanic items in English and 99.01% in German as against 99.93% consociated Romance items in English and 99.73% in German, the differences are extremely small. Surprisingly, 432 This may also be the reason why purist neologisms such as saywhat for definition or onwriting for inscription could rarely establish themselves as substitutes for seemingly hard words - which are actually often motivatable (cf. Fill 1980: 30-35). 433 Keller (1994: 23) remarks that linguistic decline is always only recognised in the language of others. 434 It is also possible to turn the tables, and start with English and point out the deficiencies of German. Thus, Kamin is unmotivatable in contrast to its English equivalent fireplace. The same is true of Sessel/ armchair or Gurt/ seatbelt. If additional languages such as Latin are drawn upon for the comparison, even more words become dissociated according to some of the readings of dissociation (cf. Chapter 1). For instance, both Pferd/ reiten and horse/ ride are unrelated, while equuus/ equit re are clearly linked morphosemantically. Consequently, it is advisable to consider each language individually and without letting the knowledge of other languages influence the conclusions arrived at. Discussion 240 both the Germanic and the Romance items show a larger proportion of consociation in English - even though it is only minimally larger. However, if the consociative strength is also taken into account, the picture that emerges is a different one. Then, the large number of German Germanic items that belong to the highest category of consociation, which involves both the analytical and the synthetic direction, has so much weight that it occupies the highest position with respect to consociation. Romance English words come next, followed by Germanic English and then by Romance German items. Taking everything into account, it seems that consociation as such is slightly higher in English and in Romance words, but if the strength of the consociative relations is considered as well, this statement has to be modified, with the results for German being more extreme than those for English, and with a higher score for the Germanic items. 5.1.8 Hypothesis 8: Old words are less motivatable but more expandable than recent words. As complex words require a base, and as the bases might be assumed to enter the language before the complex items, 435 one may expect the words that have belonged to the word-stock of a particular language for a long time to be less motivatable than more recent additions. The table in Section 4.1.6.5 confirms the first part of this hypothesis. 436 Thus, there has been a steady diachronic growth in motivatability in the English list words, from 37.70% in the words with an Old English origin to 66.02% in Middle English additions and even 73.56% in the most recent items. In addition, the proportion of full motivatability has also grown, from 4.84% in words attested in Old English times to 13.58% in the items from the Middle English period and to 24.65% in the most recent words. The second part of the hypothesis, namely that the expandability of the more recent words is lower than that of older items, is based on the assumption that words need a certain degree of integration into a language before they can be expanded, 437 which implies that the oldest words should have an advantage over the more recent formations or loans. This subhypothesis can also be confirmed: not only are the Old English words more expandable than the more recent vocabulary items, but they are even more frequently fully expandable. However, the decrease from 99.57% motivatable items with an Old English origin to 98.76% motivatability in Middle 435 Backformations and complex loan words represent an exception from this rule. 436 As period of origin was only encoded for the English words, the discussion will concentrate on English only. 437 Cf. Finkenstaedt and Wolff (1973: 161): “Productiveness and consociatedness seem to be fairly closely connected both with each other and with belonging to the ‘hard core’ of the lexicon”. Alternative results 241 English additions and 98.41% in the more recent words is not very marked. By contrast, the proportion of 92.32% of fully expandable items with an Old English origin as against 86.97% fully expandable words from Middle English and only 84.69% of more recent fully expandable items is more meaningful. Nonetheless, one may assume that ongoing word-formation processes will not reduce the degree of consociation in English, as the lowerthan-average expandability rate of neologisms is counterbalanced by their increased degree of motivatability. 5.2 Alternative results All the results given above are in accordance with the methodology described in Chapter 3. However, there are certain parameters on which there may be disagreement among different researchers. For this reason, the following sections present some alternative calculations that take such critical points into account. These calculations allow a comparison of the present study’s outcome with what the results would have been if different parameters had been chosen. However, it is not possible to simulate the outcome of completely different approaches, e.g. in the sense of a different dictionary basis or completely different methodology. 438 Even within the limits of what the present study permits without additional analyses, one cannot aim at completeness, as the multitude of parameters involved would make such an enterprise far too extensive. Consequently, the following pages can only introduce a few selected alternatives. 439 On the one hand these alternative calculations attempt to simulate other analysts, in order to compensate for the fact that the present study is the work of a single researcher, but on the other hand, they also aim to show the extremes between which consociation and dissociation can fluctuate, particularly with respect to the contrastive results. 438 Therefore, aspects such as the following are not treated in the alternative results section because they would require additional research or encoding: 1. motivatability by internationalisms, knowledge of foreign languages or foreign learners’ native language 2. motivatability of German verbs by the infinitive ending - plus reduction of motivatability within word-formations if the infinitive ending has to be omitted 3. replacement of partial expansions by full expansions if larger sources are admitted 4. potential expansion by compounds spelled as two words from corpora 5. categorisation of the German suffix -er as Germanic - in contrast to the approach adopted here, which treats it as Romance, and also in contrast to Fleischer and Barz (1995: 61), who stress its foreign origin. 439 These can be supplemented by additional analyses of the data base, which is available either directly from the author or via the publisher. Discussion 242 5.2.1 A more restrictive approach In the first place, consociation could be defined as necessarily requiring motivatability, and the synthetic direction of expandability could be either disregarded or treated as an unimportant side aspect. This would result in a consociation rate of 59.60% for English and 66.52% for German as against 99.76% and 99.16% respectively in the original calculations. In spite of the drastic reduction of consociation, which is accompanied by a reversal of the more consociated language - English in the present study’s result, German in the alternative calculation -, one may conclude that the figures are so close in both cases that the results would not contradict the central outcome of the present study, namely that English is not a dissociated language. Secondly, motivatability could be defined far more strictly categoryinternally than in the present approach. Thus, differences in only the spoken or written variety may be excluded from the category of full motivatability because of their minimally obscuring factor. Fully motivatable items with the code g could be excluded from the highest category as well because at least some of the words are merely motivatable by the grammatical suffix - which is a very peripheral type of motivatability. In addition, one may want to restrict full motivatability to cases where the affixes occur in the reference works only. If all of these factors are combined, full motivatability is reduced from 13.48% to 9.64% in English and from 17.68% to 8.48% in German. Again, the relations are reversed, but again, the differences between the languages are so small that they can be considered practically insignificant, even if one should use them as the sole basis for determining consociation. However, both English and German would then have to be considered dissociated languages. It is also possible to raise the standards for what is accepted as motivatable in general. Re-shuffling the subcategories permits exclusion of the different types of partial motivatability one by one. 440 Thus, all list words containing formal obstacles could be classified as unmotivatable because a normal speaker might not recognise the relation between the complex item and the base - a decision affecting 15.92% of the English and the very similar amount of 16.60% of the German items. Discarding the marked constituents on the same grounds would not radically change the relations either, as only 2.12% of English and 4.12% of German items are concerned. The same is true of the cases that are not completely analysable: the assumption that speakers do not analyse such words at all would reduce motivatability in English by 14.12% and motivatability in German by 12.96%. Opposing motivatability by clippings or formally related shorter 440 The few instances of full motivatability with minor restrictions are not considered in these calculations. Alternative results 243 synonyms has almost no effect in either of the two languages, as only 0.52% of English and 0.84% of German words are affected by this phenomenon. However, there are three aspects that can be used to swing the balance in favour of one of the two languages: the exclusion of motivatability by a grammatical polyseme as a motivating force would make the figures drop 15.80% for English and only 5.08% for German, resulting in a 10.72% advantage for German, which would clearly cement its higher motivatability rate in comparison to English with 61.44% as against only 43.80%. However, it is also possible to manipulate the results in such a way that English becomes the more highly motivatable language. If semantically restricted analyses are not allowed as motivating, the drop of 13.64% in English as against 23.56% in German results in a 9.92% advantage for English, with a motivatability rate of 45.96% for the BNC items and 42.96% for the DWDS words. This tendency can be strengthened by discarding words containing self-elaborated affixes as opaque, a decision affecting 8.80% of the German but only 1.68% of the English list words and resulting in a 7.12% advantage for English. If the last two aspects are combined and the overlap of 9 items, i.e. 0.36%, for English and 89 items, i.e. 3.56%, for German is subtracted, 441 the English motivatability rate of 44.64% is clearly higher than the German one of merely 37.72%. Nonetheless, such treatment has the disadvantage that it sees both languages or at least both highfrequency vocabulary lists as predominantly unmotivatable, i.e. as more negative than necessary. Furthermore, one must ask oneself how important such a 6.92% difference can be considered with respect to real-life matters: Moore (1997: 504-506) points out that in large samples - such as those used in the present study - even tiny deviations from the null hypothesis will be significant, but that statistical significance is not identical with practical significance. Therefore, he suggests asking oneself whether the effect in question is large enough to be of practical importance. The strictest possible definition of motivatability would ignore almost identical proportions of list words - 45.76% in English and 47.66% in German. In this sense, alternative calculations would not lead to drastically different results. With regard to expandability, the same restrictions can be introduced with the necessary changes. Thus, English is more expandable than German with 98.88% vs. 96.48%, but if partial expandability is not allowed, the result is reversed, with 88.36% in German as against 87.96% in English. However, the differences are only minor. This is also true if one considers partial motivatability in English and German as a potentially variable factor: 0.20% vs. 0.12% for semantic obstacles, 6.48% vs. 5.64% for marked expansions, 0.16% vs. 0.76% for the acceptance of formally related longer 441 To give an example, the English words containing both a self-elaborated affix and a semantic obstacle have the codes MPAl# RT (1x), MPFl# (6x), MPl# (1x), MPLl# (1x). Discussion 244 synonyms as expansions and an identical proportion of 0.20% items that are expanded by a grammatical polyseme are such small differences that re-calculation seems unnecessary. In all of these cases except in the expansion by synonyms, English is affected more strongly than German by the obstacles. If this factor is not regarded as reducing expandability, but formal differences - which make up 4.44% of the expansions of the BNC words as against a proportion of 1.84% for the DWDS items -, are taken into account, it is possible to manipulate the difference between the original expandability in English and German in such a way that the results are reversed and the maximum difference for this direction is reached. Then, 88.04% 442 expandability in English is opposed to 88.88% in German, which still corresponds to a slight difference of 0.84%. By contrast, it is also possible to stress the higher expandability rate of English by excluding expansion by a longer synonym, thereby contrasting an expandability rate of 98.72% for English and of 95.72% for German. This gives the maximum difference of 3% in this direction. Another way in which expandability could be restricted is by excluding words from particular sources. The exclusion of corpus-based data would affect 3.76% of the English and 5.88% of the German list words, thereby increasing the differences between the two languages. By contrast, only admitting the learner’s dictionaries as the sources for the expansions would result in a reduction of the opposition, with 82.84% of the English and 82.20% of the German items fulfilling this criterion. In any case, the reduction of the sources will usually not improve the standing of German with respect to expandability, and English remains the more easily expandable language. 5.2.2 A less restrictive approach While the previous sections have introduced more restrictive ways of dealing with motivatability and expandability, it would also be conceivable to define those concepts less strictly than in the present study. For instance, an extremely lax definition of motivatability may also take the transparent items into account, thus leading to a 7.84% difference between English and German, with the latter representing the more motivatable language. However, this figure is only very slightly higher than the original difference of 6.92%. With the present approach aiming at the discovery of motivatability as a simplifying factor in vocabulary acquisition and learning, it is relatively generous in terms of what it accepts as motivatable. A less restrictive approach would therefore have to concentrate mainly on the definition of 442 The 0.48% represented by the 12 English items with the double codes FL, FSL, FWL, LI# , Fl, FSL# , FWL and L# and the 0.2% of the five German items with the double codes FL, FVL, FSlI and L# have already been added. Alternative results 245 expandability. Thus, the analysis list is based on a British English corpus. As it is legitimate to stay within one particular variety of English, the expansions that are labelled as marked because of the label BE in the respective dictionaries could have attributed full status to them, which would raise the degree of full expandability in English by 1.36%. 443 The resulting 89.32% of fully expandable items in English would surpass the German rate of 88.36% - yet only very slightly. Twelve expansions labelled by English dictionaries and 21 German items could be disregarded on the grounds that their respective bases bear the same label in the dictionary, e.g. formal thereupon and upon. 444 As self-marked expansions represent a subjective decision by the analyst, they could be ignored in an alternative calculation, raising the levels of full expandability by 10 English and 61 German items having no further marking, and thus resulting in 88.36% and 90.80% fully expandable items respectively - which would increase the difference between the two languages by 2.04%. 5.2.3 Conclusion The previous sections have shown that it is possible to modify the motivatability and expandability rate in such a way that there is a small predominance of either English or German. Be that as it may, most of the ensuing differences are so small that they are practically insignificant. One may therefore agree with Justice (1982: 260), who finds that “Reshuffling the marginal examples would not radically change the statistical conclusion” - in this case the conclusion that English and German behave similarly enough to state that differences between the two languages’ degree of dissociation have been unduly stressed in the past, at least as far as the highest-frequency vocabulary is concerned. Because of the possible variance in the results, those introduced in Chapter 4 will be used in the following comparisons with previous research. 443 Of the 37 expansions marked as British English, three have an additional marking as informal that prevents them from reaching full status, namely me-too, pub crawl and okey-doke, which is also formally different from its base okay. 444 However, some of these are doubly-marked items as well. Discussion 246 5.3 Comparison with previous studies It has been mentioned before that no empirical studies so far seem to aim at testing Leisi’s hypothesis of the dissociated English vocabulary in contrast to the consociated German lexicon. Nonetheless, the research carried out by Fill (1980) and Scheidegger (1981) on the contrastive motivatability of English/ German and German/ French represents enough overlap to allow a comparison with the present study - even though it must not be overlooked that those studies take slightly different approaches in their vocabulary selection and with regard to what is considered a motivated word. For this reason, the following sections also give a brief summary of the approaches followed in the two studies. 5.3.1 Fill (1980) The most important contrastive study of lexical motivation in English and German is Fill’s Wortdurchsichtigkeit im Englischen (1980). Fill uses a corpus of English and German texts 445 with their translation into the other language. All in all, the impressive number of 30,191 noun, adjective and adverb tokens and 1,067 verb tokens are considered in the so-called Paralleluntersuchung, which compares the translation equivalents with regard to their motivatability (Fill 1980: 77-84). Of course, this causes problems whenever a particular word cannot be recognised in any form in the translation. Therefore, the getrennte Untersuchung, a separate study, looks at about 21,000 English and American word tokens and at about 17,400 German word tokens from original texts. Fill complements his token-based approach by analysing the word types on every 50 th page in Cassell’s (1968) German and English and English and German Dictionary, which yields a total of 1,778 word pairs. Faced by obstacles in the lexical analyses, Fill introduces Vorstufen der Wortdurchsichtigkeit ‘preliminary stages of word transparency’ 446 (Fill 1980: 50-52). These comprise Stützung, Teildurchsichtigkeit, blends, ablaut pairs, umlaut pairs and conversion. In the case of the Stützung of a word, it is possible to discern at least one affix and a Stützelement, i.e. a supporting 445 His corpus consists of the first 25 pages of 20 th -century novels, each sample comprising about 2,000 to 2,500 word tokens, and of other texts, such as several poems, two dramas, and newspaper articles (Fill 1980). 446 Fill uses the term Durchsichtigkeit ‘transparency’ for what the present study calls motivatability. Where Fill’s work is discussed, the terms transparent and motivatable are used interchangeably. Comparison with previous studies 247 element, which recurs in other words, but never as a simplex, 447 e.g. the *ceive in con|ceive, de|ceive, and the *ling(en) in ge|ling|en and miß|ling|en. Teildurchsichtigkeit, i.e. partial transparency, occurs in complex words which are transparent only with regard to one stem word, such as English cran|berry or German Him|beere. Words which are transparent with respect to an affix only, such as the negative unin un|couth and un|wirsch, or words such as dishevell|ed, wick|ed, ge|feit, whose affixes give information on their part of speech only, are not considered as really belonging to the preliminary category. Stützung 448 is counted as opaque, while Teildurchsichtigkeit 449 ‘partial transparency’ is subsumed under the transparent cases. -ly adverbs are not counted as transparent, as they are considered to belong to the domain of syntax rather than word formation, but by including them in the token count, transparency is reduced. Fill says that the problematic cases in word transparency affect only a small percentage of the words (Fill 1980: 81) - a claim that contradicts the experience gained in the analyses carried out for the present study. He works mainly with texts, which means that there are many ad-hoc word formations on his list and that the analysis material is not chosen on the grounds of its usefulness for learners. 450 Function words are regarded as part of syntax and are therefore excluded (Fill 1980: 84) - which means that the most frequent words are discarded a priori. Table 121: Transparency in Fill’s separate study: in percentages Part of speech Text type English German nouns adjectives adverbs novels 24 46 non-fiction 44 62 dictionary 68 83 verbs novels 15 41 non-fiction 21 58 dictionary 30 62 Table 121 summarises the results of Fill’s separate study (Fill 1980: 120). It is immediately obvious that motivational figures are always higher for German than for English. The highest level of transparency is reached in the type approach that compares the dictionary entries - which is the kind of 447 Cf. Fill (1980: 51): Stützung is “die Durchsicht auf mindestens ein Affix, sowie auf ein Stützelement, das in anderen Wörtern wiederkehrt, jedoch als Simplex nicht vorkommt”. 448 There are 488 instances of Stützung in English and 168 in German (Fill 1980: 91). 449 There are 105 instances of Teildurchsichtigkeit in English and 21 in German (Fill 1980: 91). 450 Thus, one reason why German novels are more transparent may be the existence of literary neologisms such as Herbstlaubmuster, Schlafkrankheitszone, Dampfkesselfriedhof and Ziegenmeckerlachepoche (Fill 1980: 100). Discussion 248 analysis that comes closest to the present study and which will be used as the basis for comparisons below. Verbs are found to be less motivatable than the other parts of speech, and the difference between English and German is particularly strong among them. Unfortunately Fill does not indicate the number of verbs as opposed to the number of nouns, adjectives and adverbs he considers in his study. As this prevents the precise calculation of a single motivational figure, in the consecutive comparisons the average is used, even if the resulting proportion of 49% transparent words for the English dictionary items can be assumed to be clearly below the actual figure. 451 These results lead Fill to conclude that the intuitive verdict of some scholars that German is significantly more transparent than English is true (Fill 1980: 119). This implies that German has a greater tendency to express a particular content through several morphological and semantic elements - but only on the word level. Fill (1980: 149-150) stresses that German is not generally a more transparent language than English because transparency is also possible on a higher level: thus, Turm ‘tower’ is opaque, Hochhaus ‘skyscraper’ is transparent, hohes Haus ‘high house’ is not a word - but transparent as well. Fill calls this phenomenon syntagmatic transparency. He finds that in 80% of the cases where a transparent word is rendered by a syntagma in a text and in 86.5% of dictionary entries where the same occurs, a transparent German word corresponds to a syntagmatically transparent English construction. This confirms Fill’s hypothesis that the lower degree of lexical transparency in English is at least in part compensated by syntagmatic transparency (Fill 1980: 150). 5.3.2 Scheidegger (1981) Scheidegger’s Arbitraire et motivation en français et en allemand (1981) offers no comparison of English and German, but rather contrasts German and French. Nonetheless, the results for the first of these two languages are an important contribution to the comparison of the present study’s results with those of its predecessors. Scheidegger’s selection of the word lists aims at representativity of the contemporary spoken language of his time and at comparability of the French and German material as far as the criteria for choice, semantic fields and the number of words are concerned (Scheidegger 1981: 40-42). That is why two frequency-based basic vocabularies are chosen, namely the Fran- 451 For instance, the proportion of verbs in contrast to nouns, adjectives and adverbs among the English analysis items in the present study is 22.64%. If this proportion were used as the basis, Fill’s motivational rate would raise to 59.40%. However, as part of speech was not encoded for the German list items, it is not possible to make an estimate of how Fill’s figures for German could be changed on the basis of the present study. Therefore, the average is used for the comparisons. Comparison with previous studies 249 çais fondamental 1 er degré as well as part of the second level for French (cf. Gougenheim et al. 1967), and Oehler’s Grundwortschatz Deutsch (1966) for German. After making some additions and deletions, Scheidegger arrives at two lists containing 1,756 words each, of which 89.5% have a semantic equivalent in the other list (Scheidegger 1981: 43). Like Fill, Scheidegger eliminates all function words, thereby retaining only nouns, verbs and adjectives (Scheidegger 1981: 42). In contrast to Fill, he includes all his word lists in the book, but he does not explain what the motivating elements are. Scheidegger gives an extensive account of problematic cases and how they are treated in his analysis: linguistic signs are only classified as motivated when they represent the union of two distinct signs, one of which determines the other (Scheidegger 1981: 47). Consequently, French printemps ‘spring’ is treated as a simplex because even though temps ‘time’ is evoked, *prin is opaque synchronically (Scheidegger 1981: 46). Similarly, the days of the week are considered to be arbitrary, just like the so-called apophonies isolées, e.g. Huhn/ Hahn and Tür/ Tor (Scheidegger 1981: 51). However, there are some exceptions (Scheidegger 1981: 47): thus, malheur ‘bad luck’ is treated as a motivated word because it stands in opposition to bonheur ‘good luck’ like the quasi-synonymous pair malchance/ chance. Développer ‘unwrap’ is motivated by the prefix déand envelopper ‘wrap’; the latter, however, is arbitrary even if it is possible to discern the prefix en- (Scheidegger 1981: 49). Clippings are considered to be motivated by their longer forms if those are still used, e.g. auto by automobile and photo by photographe (Scheidegger 1981: 56-57). Even if a derivative has experienced a semantic extension which is not marked in its form, e.g. in the case of épicier ‘greengrocer’, who sells épices ‘spices’, but also many other things, such words are considered motivated (Scheidegger 1981: 52). Scheidegger also accepts words as motivated even if only one of their meanings actually is, as in embrasser ‘to kiss’, which is motivated in its less frequent sense of ‘putting one’s arms around someone or something’ (Scheidegger 1981: 52-53). Like Fill (1980), Scheidegger only makes yes-no decisions between motivated and arbitrary words. Items which are semantically related but show small formal divergences, such as sel ‘salt’/ sal|er ‘to salt’ or diriger/ direc|tion, lire/ lec|ture, are considered motivated (Scheidegger 1981: 53). Eventually, the study yields the result that motivation is slightly higher in German than in French, with 35.7% as against 28.8% of the analysis items (Scheidegger 1981: 83). Therefore, Scheidegger (1981: 87) arrives at the conclusion that it is not possible to consider German a language where motivation is the rule and arbitrariness the exception. The majority of the words are arbitrary in both languages. Discussion 250 5.3.3 Summary Table 122: Motivation in previous studies: in percentages A UTHOR M ATERIAL E NGLISH G ERMAN Fill novels (n, adj, adv) 24 46 Fill non-fiction (n, adj, adv) 44 62 Fill dictionary (n, adj, adv) 68 83 Fill novels (v) 15 41 Fill non-fiction (v) 21 58 Fill dictionary (v) 30 62 Fill dictionary (average) 49 73 Scheidegger basic vocabulary list - 36 Table 122 reveals that the figures for the motivation of the English and German vocabulary vary greatly, depending on the material used in the respective analysis: thus, the results fluctuate between 15% and 68% for English and between 35.7% and 88% for German. In all cases, though, German always comes before English. If Fill’s research is represented by the dictionary average only, the comparison with the present results can be summarised as in Table 123. Table 123: Comparison of previous studies and the present study: in percentages A UTHOR E NGLISH G ERMAN Fill (dictionary average) 49 73 Scheidegger - 36 Sanchez 60 67 As far as the figures for German are concerned, Fill and the present study reach fairly similar results, while Scheidegger’s figures are much lower. With respect to English, though, Fill’s 49% result lies very clearly below the roughly 60% outcome of the present study. This is highly surprising, as one would expect the motivatability of high-frequency vocabulary to be lower than that of random dictionary entries. Furthermore, the average conjoint motivatability of nouns, adjectives and adverbs in this research project is 66.93%; if the proportion for verbs is added and the average calculated, the resulting 61.23%, which make the figure more comparable to Fill’s result, represent an even higher motivatability rate than in Table 123. 452 A possible explanation for this phenomenon lies in the definition of what is accepted as a motivated word. As pointed out previously, Fill does not accept moti- 452 The same phenomenon, resulting from the omission of the assumedly lowmotivational functional words, can be expected for German, which would bring the percentages even closer to Fill’s result. Limitations and countermeasures 251 vatability by an affix only, and -ly-adverbs are also considered arbitrary. If the MPA-motivatable words in the present study are counted as unmotivatable, 180 English and 136 German list items are affected. There are 82 adverbs ending in -ly in the BNC list, all of them motivatable; 72 of them even fully. Taken together, this means that there would be 262 fewer English and 136 fewer German motivatable words among the original list words if Fill’s principles were applied. The resulting motivational percentages of 49.12% for English and 61.08% for German are based on all parts of speech; if functional words were omitted, they would be slightly higher. Nonetheless, the proportional differences change only slightly: while English and German differ by 24% in Fill’s study, the present results show a 7% difference, which increases to 12% if Fill’s principles are applied. However, the resulting variation cannot change the general result, namely that English and German behave far more similarly in the present study than in the previous research. 5.4 Limitations and countermeasures The present research project attempts to study the phenomenon of dissociation in its manifold aspects as precisely as appropriate and as objectively as possible. The principal aim is to supply empirical data for a linguistic theory that has apparently not been subjected to this kind of test before - at least not in the generally accessible literature. Nevertheless, the results discussed in the previous sections must be considered with due caution because both the design of the research project and the nature of the subject itself impose a number of limitations on the data. Albert and Koster (2002: 2-3) distinguish quantitative and qualitative research in their introduction to empiricism in linguistics and language teaching research. 453 As qualitative research very frequently involves introspection, and as the necessity for empirical research is justified by the fact that certain linguistic problems transcend the linguistic ability of the researcher, one may conclude that qualitative research involving introspection is not empirical research in the strictest sense. This would also apply to the project described here, which resorts to introspection in some areas, e.g. that of motivatability, and which is also interested in the qualitative aspects of dissociation. However, as real language data is collected and then analysed in the present study, with the quantitative part playing a decisive role, it should conform more or less to the requirements of empirical stu- 453 They do not consider qualitative research because they say that it does not involve numerical data. However, in two other contexts (Albert and Koster 2002: 3 and 156), it becomes clear that qualitative research can also involve calculations, even if these are not typically the main focus of attention of this type of research. Discussion 252 dies, e.g. as defined in the SOED: “Based on, guided by, or employing observation and experiment rather than theory” (SOED s.v. empirical). As the hypotheses introduced above cannot be tested in every possible aspect, 454 the aim is to use representative samples and methods in order to make more general statements. The reasons underlying the choices and decisions, which are described in Chapter 2 and Chapter 3 should be fairly understandable, but this does not exclude the possibility that the samples are not as representative of the aspect of reality being tested as they should be. For instance, the DWDS Core Corpus is based on written language only. One may argue that in order to achieve higher comparability, the English frequency list should have been based on the written BNC only. However, the position adopted here is that the most representative corpus for both languages should be chosen, as the written-only approach would reduce the representativity of the BNC unnecessarily. Furthermore, Hunnius (1983: 213) doubts that high-frequency vocabulary is a good basis for motivational studies, as the number of complex words in this group is assumed to be lower than average. However, using the most frequent part of the lexicon as the basis for analysis makes the results more relevant for native speakers and foreign learners of the language, who both need to master this frequency slice in order to use the language effectively. Furthermore, in a contrastive study such as the present one, the influence of the lower proportion of complex items is relatively negligible because it should affect both languages to a comparable degree. Consequently, even if the present results can only be considered representative for the most frequent part of the English and German vocabulary, one may assume that the general motivatability of both languages is even higher and that the results can thus be generalised in certain respects. 455 A related critical area is that of quality requirements: in order to be accepted as reliable, empirical studies should yield identical results when repeated under the same circumstances (Albert and Koster 2002: 12). However, the design of the present research project makes a true repetition impossible. Having dealt with the listed items more than once, the analyst may particularly remember the solutions for problematic items. Nonetheless, as the data base of the present study is made accessible to interested readers, 456 other researchers are enabled to compare the results of the present study with their own analyses. 454 For instance, it is impossible to analyse every word of the two languages in question before arriving at a conclusion. 455 Figure 6 and Figure 11 allow the conclusion that the motivatability rate of both English and German slowly but steadily grows with decreasing frequency. 456 Please get in contact with the author either directly or via the publisher. Limitations and countermeasures 253 The procedure used here is not the only one possible, 457 and even though it can be justified for the reasons outlined above, it is not necessarily the best. Its basic problem is the fact that the introspection that is drawn upon particularly for the motivational analyses introduces an element of subjectivity. 458 Nevertheless, while this point deservedly attracts criticism, the empirical study of dissociation presents every researcher with the same dilemma: the objective determination of a word’s parentage to other words is only possible on the diachronic level - if it is possible at all. By contrast, the contemporary synchronic relations between words, which correspond to the experience of the linguistically inexpert language user and which are the focus of interest of this study, are not documented in a way that could be profitably included in the research. 459 Consequently, it is not possible to determine the degree of dissociation in contemporary English and German without resorting to a large number of relatively subjective decisions, which may be specific to the analyst. 460 That there is no hard-and-fast criterion guaranteeing a safe motivational analysis for all words is already pointed out by Gauger (1970: 122), who considers the existence of borderline cases that can be categorised in different ways a general feature of natural languages. The problem of subjectivity in the analysis of dissociation, which poses itself to every researcher working on the topic, 461 can be explained by the involvement of semantics. As there are no generally accepted criteria that could be used in the delimitation of word meanings, one may therefore be tempted to reduce the influence of this factor to a minimum: indeed, it is possible to carry out lexical analyses that are purely form-based, or to let a computer perform the task. This may even result in meaningful analyses for a considerable number of words, e.g. gardener or basketball, whose decomposition poses no problems. However, other analyses of this type result in nonsensical decomposition: thus, bishop would be analysed into biand shop, coalition into coal and -ition, or badminton into bad, mint and on. As these segmentations are merely transparent but not motivated - motivation 457 A contrastive study of motivation in French and Italian employing a completely different methodology makes use of the judgments of several hundred native speakers on 700 lexical units per language (cf. Marzo and Rube 2005, and Koch and Marzo 2007). 458 However, Rettig (1981: 78) defends introspection as a linguistic method in such a context. 459 There seems to be no synchronic reference work for English, and the criteria underlying the establishment of word families in the German Wortfamilienwörterbuch der deutschen Gegenwartssprache (Augst 1998) are problematic (see Section 3.3.1). 460 Also cf. Fill (1980: 49). For instance, Quirk et al. (1985: 1551) refer to the word business as a ‘frozen’ item in which no base is recognisable, while the present approach accepts busy as a valid, if both formally and semantically slightly altered, constituent. 461 Consequently, Leisi would have been confronted with the same problems if he had tested his hypothesis himself. Discussion 254 requiring semantic similarity -, and as such cases would not be distinguished from the legitimate analyses above, no conclusions for dissociation could be drawn on the basis of such a procedure. 462 After all, the definition of consociation states very clearly that not only formal, but also semantic features are involved. 463 Quite plausibly, then, semantics is such an essential part of the concept of consociation/ dissociation that it simply cannot be avoided in the empirical study of this phenomenon - with all the consequences this has on subjectivity. The material used in the search for expansions imposes a number of limitations on the results of the study as well. For instance, the dictionaries used for English and German are not entirely comparable with respect to their number of entries, which may distort the contrastive results - if only very slightly, as most of the items analysed are expandable anyway. Another problem lies in the restricted search facilities of the DWDS Core Corpus. It is not possible, for example, to search for sequences of characters that are truncated both at the beginning and at the end in this corpus. Truncated sequences are excluded from lemmatising, too. This is particularly problematic because German words are often inflected. With the additional difficulty of distinguishing between upper and lower case - which is important because affixations may change their case if nouns are involved as a base or complex item -, many variants have to be entered in the search window before being able to conclude that there is no expansion in the DWDS Core Corpus for a particular word. Such processes are prone to error, and it may therefore be the case that valid expansions were overlooked in this study in some cases. Similarly, the retrieval of partial expansions involving a spelling difference - i.e. those involving the F-labels with the exception of the onlyspoken obstacles and vowel mutation 464 - depends to a certain degree on chance. This can be attributed to the fact that the search pattern *word* only finds complex words containing the exact sequence of letters of the search string. Consequently, all instances where expanded words differ in spelling from the original word in more than a vowel mutation are the result of a 462 For instance, a purely formal computer-based approach would make at least 110 semantic mistakes in the English items that are unmotivatable but transparent. To this can be added 290 partially motivatable words with spoken and written obstacles, 46 partially motivatable words with written obstacles, 46 fully motivatable words with written obstacles only, 24 items with vowel changes and the 353 instances of incomplete analysability, which the computer may not recognise. Even if there is a certain degree of overlap between the categories, these figures show that a purely formbased approach is error-prone in many respects. 463 Cf. Leisi (1999: 51) in Section 1.1. 464 The German electronic dictionaries used here also retrieve vowel-mutated words if the vowel without the diacritic sign is used within the search item. Limitations and countermeasures 255 successful hypothesis on the part of the researcher who has a different command of the two languages. So, even if the whole project relies to a large extent on the aid of computers - e.g. for the establishment of frequency lists from corpora, in the search for expansions in electronic dictionaries and in the sorting of the data base -, the analyses ultimately depend on a fallible human researcher. Consequently, the readers are requested to kindly excuse instances where coding mistakes may have occurred as the result of occasional typographic errors, incorrect, overly strict or lax application of the rules, where existing expansions may have been overlooked in results lists of several hundred items, or where mistakes that currently escape the imagination of the analyst may have crept in. 465 It is to be hoped, though, that not too many of these errors have escaped the repeated processes of revision, so that one may confidently hope for an appropriate level of precision in this respect. Be that as it may, it is highly probable that such errors are relatively evenly distributed across languages and should not alter the results substantially. There is yet another area in which human researchers are more prone to error than machines, and which is more difficult to control: Leech (1981: 72) mentions the danger inherent in introspective studies that the researcher might “make the ‘facts’ suit his theory”, which can occur both consciously and subconsciously. While such behaviour on the conscious level goes against the recommendations on guaranteeing good scientific practice set up by the Deutsche Forschungsgemeinschaft (1998) and is consciously avoided by the present analyst, the subconscious level, by definition, escapes all active attempts at control. 466 However, the design of the present study should contribute to the reduction of subconscious manipulation, as the procedure employed here did not contrast the languages directly. Instead, the analyst proceeded intralinguistically, and the figures for English and German were only compared in the final analysis. This means that the analyst concentrated on each individual word and only had a vague impression of the degree of consociation of the languages until late in the research. After all, the suspicion of being biased cannot be taken seriously enough. For instance, Scheidegger (1981) is criticised by Blumenthal (1997: 110) and Hunnius (1983: 214) for seeming to solve contentious cases by classifying German words as unmotivated and French words as moti- 465 Partial motivatability is particularly vulnerable in this respect because of the manifold distinctions that are made. However, fine-grained categories are necessary in order to provide a justifiable basis for the subsumption into larger categories and in order to allow alternative calculations. 466 The recommendations of the Deutsche Forschungsgemeinschaft (1998) even contain a passage stating that - contrary to dishonesty - bona fide mistakes belong to scientists’ fundamental rights. Discussion 256 vated. 467 Hunnius (1983: 213-214) attributes this to Scheidegger’s different levels of competence in the two languages - a factor that may also have played a role in the present analysis. Bally (1909/ 1951: 55) argues very plausibly that motivation should be felt to be stronger in a foreign language than in one’s native language because more attention than usual is paid to the form. 468 As a consequence, it is theoretically possible that this study, written by a native speaker of German, might have overlooked motivatability in German rather than in English, thereby minimising the differences between English and German. However, every effort was made to treat the two languages identically by always aiming at the highest level of motivatability in both English and German. The awareness of a subconscious bias towards English may even have had the opposite effect of exaggerating motivatability in German. 469 Considering the numerous limitations mentioned above, one may wonder whether dissociation should be studied quantitatively at all. Quite predictably, the answer given here is yes, for the following reason: as long as Leisi’s statements about the dissociation of the English vocabulary and the consociation of the German vocabulary are not tested, they remain nothing but hypotheses - although these hypotheses seem to have entered basic teaching of English linguistics in Germany as commonly accepted facts. The attempt to confirm or refute such influential claims is definitely a justifiable aim. Even though Leisi illustrates his point with numerous examples, one must argue in the words of Weinreich (1955: 539) that “Illustrations are not evidence”. Ullmann, at whose Précis de sémantique française (1952) this criticism was directed, defended his position by countering that the tendency towards dissociation of the French language “is so obvious that no numerical demonstration is needed; hence the general consensus of opinion on this point” (Ullmann 1962: 106). However, claims such as these 467 However, it can be argued in his favour that there are also several instances where he decides to his own disadvantage, e.g. in the case of médicine, which can be related to médecin ‘doctor’. 468 Cf. Bally (1909/ 1951: 55): “il est naturel que dans une langue étrangère, où nos associations pensée-parole sont imparfaitement fixées, l’esprit se laisse induire à exagérer les effets musicaux là où ils existent, et à les imaginer là où il n’y en a pas. […] Plus les associations évoquées par le sens des mots font défaut, plus on voit intervenir les associations artificielles provoquées par la forme de ces mots”, “It is natural that in a foreign language, in which our associations between thought and word are imperfectly fixed, the spirit permits itself to exaggerate the musical effects where they exist and to imagine them where there are none. […] The more the associations evoked by the meaning of the words are lacking, the more the artificial associations provoked by the form of these words can be observed to intervene” (my translation). 469 Furthermore, the current approach would count German words as motivatable that are otherwise classified as unmotivated by a number of German philologists. For instance, neither Püschel (1978: 158) nor Ulrich (1972: 283) accept motivation of Handtuch ‘towel’ by Hand ‘hand’, as towels may also be used for other parts of the body. Limitations and countermeasures 257 can in turn be countered by the theories of Karl Popper (reported in Leech 1981: 60): there is no expectation that we shall ever arrive at ‘the truth’, but rather, the method of science ensures better and better approximations to truth, by eliminating errors in theories - that is, by falsifying hypotheses. We can never prove a theory true, but we can (if it makes claims which can be tested) prove it false. Thus even the most well-founded theories are tentative or provisional; they are (to use another formulation of Popper’s) ‘bold conjectures’. It may therefore be concluded that the testing of theories, particularly of established ones, is a scientific necessity in the search for truth - which is in turn a vital characteristic of science itself -, 470 even if the subject imposes certain limitations on the approach. In this respect, there is an analogy to the marking of pupils’ and students’ performances by their teachers: according to Becker (1983: 11), innumerable publications prove that marks are an unsuitable instrument for the objective and reliable evaluation of pupils’ and students’ performance. 471 Nevertheless, they persist, and efforts to replace marks with other evaluative systems have been relatively unsuccessful (Becker 1983: 24-29). Becker (1983: 32) therefore concludes that teachers have to go on giving marks, but that they should at the same time express their doubts about the validity of the process. For the same reason, the limitations of the present research project are also pointed out, but the need for the analysis as such is not called into question. 470 Cf. the recommendations of the Deutsche Forschungsgemeinschaft (1998): “Forschung im idealisierten Sinne ist Suche nach Wahrheit.” 471 Ingenkamp (1997: 107) summarises the results of a study by Starch and Elliot (1913) in which the same examination was corrected by 128 different teachers and scored between 28 and 92 out of 100 possible points. This result is particularly surprising insofar as the examination in question was from the domain of mathematics, where a high degree of coincidence would have been expected. He also reports evidence from studies in which correctors had to mark the same examination again after several weeks or months - with the result that the evaluations varied considerably (Ingenkamp 1997: 112-113). Starch and Elliot (1912) are also reported to mention several factors causing variability in evaluation, such as the different emphasis laid on particular parts of a solution. To this can be added Becker’s (1983: 12) observation that teachers have a certain tendency to mark very strictly or more leniently, which is independent of the actual results. Ingenkamp (1997: 110-111) also draws attention to the fact that teachers generally assess individual examinations in relation to the performance of other pupils within the same class. He reports the results of a comparison between different sixth forms in Berlin, where the level in mathematics varies so widely that it suggests that a performance which would correspond to an A in one particular class may actually be lower than that marked with an F in another class. This finding is of particular importance in view of the fact that some crucial decisions in a person’s life are mainly based on marks, such as admission to a higher class and to university, or the conferment of degrees and scholarships (Ingenkamp 1997: 116). Discussion 258 In addition, a number of countermeasures against the limitations discussed above were taken in the present study: 1. the word-by-word intralinguistic procedure and the conscious awareness of a possible bias should reduce biased analyses to a minimum 2. the category of partial motivatability allows the necessary degrees of gradience 472 3. several processes of revision should ensure a maximum of correctness in the data base 4. the data base is made publicly accessible to allow comparative studies. 473 These countermeasures should reduce the problematic effects to a minimum. 472 There are numerous ways in which a word cannot be said to be completely unmotivatable while belonging only to the periphery of motivatable words. The category of partial motivatability therefore allows delimitation of full motivatability in a very strict fashion, but at the same time, a more generously defined percentage of motivatability can be calculated as well. 473 Even if the data obtained through introspection is subjective and therefore fallible, making it publicly accessible is a first step towards objectivisation, as it can be confirmed or refuted by the introspections of others (cf. Leech 1981: 71). 6 Consociation and dissociation in perspective The results of the present study have been extensively discussed in Chapter 4 and Chapter 5, and the previous sections have also compared the outcome of this research project with prior studies in the domain of motivation. What remains to be done is to view the phenomenon of consociation/ dissociation and the resulting effect it has on English and German under a variety of perspectives. Dissociation being a lexical phenomenon, it can be related to different areas that deal with vocabulary in some way or another. Fascinating as the findings may already be in themselves and as partial confirmation and disproval of widely held linguistic opinions, they become of particular interest when applied to the field of psycholinguistics, in particular to the mental lexicon, and to didactics, in particular to vocabulary teaching. A brief discussion of the role of dissociation within these domains will be followed by some concluding remarks. 6.1 Psycholinguistic perspective: the mental lexicon One frequently quoted definition of the mental lexicon is that given by Schwarz (1996: 84): “Das mentale Lexikon ist der Teil des LZG [= Langzeitgedächtnis], in dem die Wörter einer Sprache mental repräsentiert sind.” 474 The basic elements of this mental subsystem are the lexical entries, i.e. lexical units which relate the phonological, syntactic and semantic information on words to each other (Schwarz 1996: 125). According to Clark (1993: 3), entries in the mental lexicon must include at least four kinds of information on each of the items: meaning, syntactic form, morphological structure and phonological shape. Apart from this, they may also include additional information on the status, the usage and the connotations of the item (Clark 1993: 4). 475 Figures referring to the assumed extent of the mental lexicon vary widely. 476 474 Schwarz (1996: 84): “The mental lexicon is that part of the long-term memory in which the words of a language are mentally represented” (my translation). 475 Other types of information could be added to this, such as a memory of the context in which a particular word was heard for the first time, or a sound image of a particular person saying that word - at least for certain items. In the case of linguists, even more information than usual may be stored for particular words, e.g. their etymology, their age, whether their constituents are productive, etc. 476 Pinker (1995: 150) arrives at about 60,000 words and Laufer (1998: 255) reports a vocabulary of 18,000-20,000 word families for 18-year-old English native speakers. In contrast to this, Clark (1993: 13) distinguishes between active and passive vocabulary. She comes to the conclusion that adult speakers dispose of a production vocabulary Consociation and dissociation in perspective 260 In a second language, the size of the mental lexicon largely depends on the stage the learner has reached at a particular point in time, which makes individual speakers even less comparable due to the different amount of learning time or effort they have put into vocabulary learning. 6.1.1 Full listing vs. minimal listing The most important issue relating the phenomenon of consociation and the mental lexicon is the question of the form in which complex words are stored. The alternatives are storage as a whole, the so-called full listing hypothesis (Butterworth 1983: 260) versus storage in a decomposed form, the so-called minimal listing hypothesis. 477 In a minimal listing model, complex words are analysed into their constituents, e.g. football into foot and ball. As each constituent is stored only once and combined with other elements if required, this treatment has the advantage of being non-redundant. For comprehension, the meaning of a complex word is deduced from the stored meanings of its constituents, and for production, mono-morphemic constituents are assembled to make up complex words. However, there are several reasons why such a model is not completely satisfactory. Firstly, a considerable number of words are transparent, but not motivatable; a purely minimal listing model could arrive at no sensible interpretation of the words understand or bishop. Consequently, complex words with an unpredictable semantic component need to be stored. Secondly, idiomatic speech requires particular concepts to be expressed by particular sequences of morphemes. For instance, ‘a tool made of twisted metal that you use to pull a cork out of a bottle’ 478 could be expressed by the complex words *corkpuller, *bottlescrew, bottle opener - which has a different meaning in English -, *winetool, *winescrew or *decorker, but it is generally referred to as a corkscrew. Aitchison (1994: 128-130) suspects that common words are stored in a complex form with their prefixes and suffixes because otherwise there would be far more cases of wrong attachment, such as *dishappy or *non-happy instead of unhappy. Similarly, speakers of German need to know whether the first part of a compound has a singular form, as in Kopfkissen, a plural form, as in Kinderbett, or an s-form, of between 20,000 and 50,000 word forms, while their comprehension vocabulary is considerably larger. Augst, Bauer and Stein’s (1977: 11-13) calculations for German, which are based on the results obtained by asking two adult German native-speakers a certain percentage of the words in a dictionary, amount to about 95,000 words. However, one may expect considerable individual variation between speakers - otherwise, aptitude tests based on vocabulary exercises would make no sense (pointed out by Thomas Herbst). 477 Cf. e.g. Handke (1994: 91). 478 This is the definition of corkscrew in LDOCE. Psycholinguistic perspective: the mental lexicon 261 as in Versicherungspolice. 479 Götz (1999: 222) concludes that German dictionaries have to list a large number of compounds “if only for reasons of morphology”. The same can be assumed to be true of the mental lexicon. Thirdly, complex words may be semantically regular, but have formal particularities: Aitchison (1994: 129) and Clark (1993: 7) distinguish two types of suffixes. The suffixes of the first class, which includes -ity and -al, affect their stems fairly radically, as in sanity or industrial. There can be shifts in the word stress or palatalisation of to , among other things, such as in electric > electricity. These suffixes are sometimes referred to as plus-type suffixes. By contrast, the hash-type suffixes of the second class, such as -ness or -ism in goodness and alcoholism have little or no effect on the stem. Aitchison (1994: 129) believes that plus-type suffixes “are glued firmly onto their stems”. Even though the evidence is less clear with hashtype suffixes, she assumes that these are “probably firmly fixed in common words” as well (Aitchison 1994: 130). Fourthly, there are some complex words which have no synchronic base in the language from which they could be derived. These baseless derivatives (Aitchison 1994: 129), such as perdition, conflagration and probity definitely need to be stored as wholes. In fifth place, even though analytical linguistics attempts to analyse units as much as possible down to minimal units, a cognitive viewpoint recognises that people commonly learn larger units. “For example, the lexeme activity is surely learned and used as a unit by English speakers despite the fact that it can be analyzed into three morphemes” (Lamb 2001: 169). Jackendoff (2002: 152) goes even further: The word dog must be stored in long-term memory: there is no way to construct it online from smaller parts. By contrast, the reader probably understands the utterance My dog just vomited on your carpet by constructing it online from its constituent words, invoking the combinatorial rules of English grammar. However: since I have memorized this sentence, having used it as an example in many talks, it is present in my long-term memory. In fact any utterance can be memorized, all the way from little clichés to the words of Take Me Out to the Ballgame to the entire text of Hamlet. Thus we cannot predict in advance that any particular part of an utterance must be constructed online by a given speaker. We can predict only that speakers can construct some part online on demand if it hasn’t already been memorized. In sixth place, some complex items occur very frequently in the language and are used by young children who cannot be assumed to know their less frequent constituents. Consequently, words such as famous may need to be stored in their full form. 479 Käge’s (1980: 30-31) examples Rind|fleisch, Rind|s|gulasch and Rind|er|herz also differ with respect to their use of interfixes in spite of an identical first constituent. Consociation and dissociation in perspective 262 Finally, people who are asked whether they know a certain new complex word usually say yes or no - which means that as far as they have conscious access to their mental lexicon, complex words are also stored in it, and it is possible for them to check whether words are in the (consciously accessible) store or not. 480 The arguments presented so far seem to favour a full listing rather than a minimal listing approach. However, such a model cannot account for the treatment of unfamiliar words or neologisms. If the mental lexicon were working solely on a full-listing basis, it would be impossible to understand a nonce word such as *wug-eater because it is not part of the internal word list. However, the fact that the word’s meaning can be deduced via the meanings of its components - *wug-eater being likely to mean ‘someone or something that eats wugs’ - 481 proves that a purely full-listing approach cannot be right. Consequently, the idea of a combined model that integrates both complex words and smaller constituents suggests itself. Even if a “fundamental motivating principle for many linguistic theories is the minimal-redundancy, maximally economical grammar or lexicon” (Jurafsky 1996: 144), “there is no justification for assuming that concepts of analytical linguistics can be taken over directly into an understanding of the cognitive basis of language” (Lamb 2001: 177). By contrast, psychological models “often emphasize the vast storage capability of the mind” (Jurafsky 1996: 114). In the same vein, Aitchison (1994: 131) believes that “people can decompose words into morphemes if they need to. […] It is also probable that they disassemble a word if they are faced with a long, complicated one, [sic] whose meaning they are not quite sure about.” Therefore, “massive parallel processing may be the norm. Humans may automatically handle words on more levels than one at the same time. They may subconsciously both deal with the word as a whole and split it up as they go along” (Aitchison 1994: 131). Aronoff and Anshen (1998: 240) suggest that “the search for the proper word can be viewed as a race between the mental lexicon and the morphology. Both operate simultaneously, and the faster one wins.” A more detailed account is given by Frauenfelder and Schreuder (1991: 172): Baayen’s Race Model assumes that all morphologically complex words have both a full listing and a morphologically decomposed entry and thus can be recognized in principle by either the direct or the parsing route. The time taken to process a word with the direct route depends upon the token frequency of the target word. […] These two routes start simultaneously and race in parallel, with the one reaching completion first giving its output. Although the direct 480 Furthermore, Schreuder and Baayen’s (1997: 133) finding of a “substantial effect of family size provides evidence that indeed the meanings of large numbers of complex words are stored” (cf. Section 6.1.2.2). 481 The imaginary word *wug comes from Berko (1958: 154). Psycholinguistic perspective: the mental lexicon 263 route is generally quicker than that involving parsing, the two overlap temporally to a limited extent. Consequently, low frequency forms can be recognized via either route. The prediction that frequent words are directly accessed may explain why speakers are so reluctant to analyse them. 482 6.1.2 The effect of consociation/ dissociation on the mental lexicon The central question with regard to the mental lexicon in the present context is in what way consociation and dissociation have an influence on the aforementioned processes. The two components of consociation - motivatability and expandability - are considered individually before attempting a joint conclusion. 6.1.2.1 The effect of motivatability Any kind of transparency - however limited it might be - can be useful for the receptive language user because it reduces the number of possible hypotheses: if someone points at a house in the distance and says “Look at that *wug”, listening learners are confronted with a multitude of possible meanings such as ‘house’, ‘colour’, ‘funny roof’ or ‘massive door’. If the sentence “Look at that *window-wug” is uttered in the same situation instead, this will not result in an unequivocal interpretation, as window-box, window-pane, window plant, window-screen and window-sill are all among the possible targets. Nevertheless, many parts of the house, such as the chimney or the roof tiles, will be ruled out. The same is true of derivatives with an unknown base but a discernible suffix (Randall 1988: 230): Consider two words of comparable length and frequency: [xxxxxx] and [yyyish]. To understand [xxxxxx], aside from the usual contextual cues, there is very little to go on. One must simply learn the meaning of the word. For the complex word [yyyish], the same contextual cues are available, but there are two additional sources of information: the base [yyy] and the -ish affix. However unfamiliar [yyy] might be, -ish reveals some insight into both [yyy] and the complex word. [yyyish] is an adjective; [yyy], either a noun (like [man] + ish; [fool] + ish) or an adjective ([yellow] + ish; [lively] + ish). Neither one is a verb. Randall (1988: 230) therefore formulates the hypothesis “that complex words are easier to understand than simple words”. Blumenthal (1997: 109-110) also argues in favour of complex motivated words when he says that German Walfisch ‘whale’, literally ‘whale-fish’ and Seehund ‘seal’, literally ‘sea-dog’ are maybe misleading but preferable to their unmotivated French equivalents baleine and phoque because the former 482 This also explains the surprise of the analyst at discovering that function words are often analysable. Consociation and dissociation in perspective 264 give a first hint of the words’ meaning. However, Blumenthal (1997: 110) also warns that the use of such motivation must not be overestimated, and he claims that many learners of German may actually prefer learning an arbitrary word from scratch to finding their way in the labyrinth of German prefixes. 483 Interestingly, Štekauer (2005: 245-246) finds that native and fluent nonnative speakers show no significant differences in their degree of meaningprediction with novel, context-free naming units. While there are many potential readings of new words, there are usually only very few highly predictable ones (Štekauer 2005: 257). He concludes from his research that those “readings of novel, context-free naming units which express stable and habitual relationships and/ or are based on prototypical features of the objects named show a higher meaning predictability” (Štekauer 2005: 251). However, any “figurativeness, that is to say, any semantic shift (metaphor and metonymy) appears to become a serious obstacle to meaning predictability” (Štekauer 2005: 246). 484 If the meaning of the majority of words in a language is to some degree derivable from their linguistic form, as is the case in a consociated language, this should be an aid in comprehension and put dissociated languages at a disadvantage. However, this deduction is not absolutely true: analysability is only of importance in comprehension if the encountered words are unfamiliar. As the 2,000 most frequent words cover about 90% of slightly difficult texts, and as the first 4,000 most frequent words cover some 96% (Stein 2002: 34-35), learners should be able to deal with unknown texts relatively successfully as soon as they have reached a certain degree of proficiency. 485 Of course, one might argue that it is precisely the less frequent words which make the largest semantic contribution to a text (Wolff 1969: 232). However, the less frequent words are generally more motivatable (cf. Figure 6 and Figure 11). This is true of the languages that were considered in the present study, and it can probably even claim universal status. Therefore, as a certain number of the high-frequency items have to be learned by rote even in highly consociated languages, and as a large proportion of the low-frequency items can be expected to be motivatable in some way, the 483 For instance, über- ‘over-’ and unter- ‘under-’ are antonyms, but übernehmen ‘take over’ and unternehmen ‘undertake’ are not, even if the English words consist of the same constituents as the German ones. 484 This justifies the consideration of the meaning-related obstacle category # in the present analysis. Institutionalised figurative meanings - which are not considered as obstacles in the present study, as they should be included in the respective words’ dictionary entries - do not seem to reduce predictability (Štekauer 2005: 247). 485 According to Laufer (1997: 20-24), there is a strong link between reading comprehension and vocabulary knowledge. The threshold vocabulary comprises about 3,000 word families or 4,800 words. Psycholinguistic perspective: the mental lexicon 265 comprehension of dissociated languages should pose fewer problems than one might initially believe. Another argument that relativises the importance of consociation is the fact that idiomatic speech production relies rather heavily on fixed relations between meaning and form (cf. Sinclair 1991: 110-111). As has been mentioned before, there are many possible ways of expressing the concept ‘a tool made of twisted metal that you use to pull a cork out of a bottle’, but corkscrew is usually the only one used in normal conversation and writing. Consequently, once such links have been established in a speech community, analysability or phonetic similarity may only play a minor role. Nevertheless, motivatability may be helpful in speech production: in a completely opaque language, the number of possible formal realisations for a particular meaning would be infinite because it would include all phonotactically possible sound chains. 486 To use the above example again, in addition to the words mentioned, the concept ‘corkscrew’ could also be expressed by *baid, *fuke, *potilodimp etc. The knowledge that the target word is likely to be motivatable drastically reduces the number of possible solutions: for instance, it is highly unlikely that shoe-lace would express this concept. Of course, there is no guarantee that an idiomatic realisation will be produced in a motivated language, but from a communicative point of view, it is much better to say *corkpuller instead of corkscrew than nothing, and this is certainly better than producing a completely opaque soundchain which will leave the hearer puzzled. By contrast, Ronneberger-Sibold (1980: 141) makes an interesting observation relating to economical issues in the grammar of inflecting languages: once speakers have decided on the morphemes they want to use, such as {1 st person}, {singular}, {present}, {indicative}, {active}, the easiest way would be to have one morph for each, as would be the case in an agglutinative system. However, inflectional systems require joint coding as a portmanteau morpheme {1 st person singular present indicative active} and its realisation as the corresponding morph. While this may result in an increased coding time, the phonological outcome is shorter (Ronneberger- Sibold 1980: 141). Thus - to use an English example - the irregular past tense form took is shorter than the regular past tense hated, even though the infinitives of the two verbs have the same length. These results could also be transferred to word-formation: unmotivatable words are shorter than their motivatable equivalents would be. For example, ball is shorter than *roundthing, and bed is shorter than *sleeping-place. This means that a language with many unmotivatable words may require a longer encoding time because the forms that have to be retrieved from the mental lexicon show no similarity to the linguistic forms of the semantic constituents (e.g. 486 This does not necessarily exclude forms that already occur in other words, as polysemes and homonyms are always possible. Consociation and dissociation in perspective 266 ‘round’, ‘object’ or ‘toy’ for ball). However, as the encoding time is a matter of milliseconds, it is outweighed by the far larger economy in articulatory transmission time. Thus, while languages with a lower degree of motivatability may require a stronger intellectual effort, they are more economical from the physical point of view. 487 This may also be the reason why sign languages that are originally pantomimic or iconic have a tendency to become more arbitrary (Frishberg 1975: 696): the American Sign Language word gold used to be a compound of earring and yellow, but now the movement is a smoother blend that increases fluidity (Frishberg 1975: 707). If motivation were as important as many people seem to believe, obscuring would be impossible. This also applies to the spoken language. Frishberg (1975: 718) therefore offers the assumption that there may be a universal ideal proportion between icons and symbols that any language will approach. 488 A large number of different studies have been devoted to testing whether the motivatability of words has any effect on their storage, processing etc. in the mental lexicon. The present study will confine itself to presenting only a small selection of results. While Feldman and Raveh (2002: 195) find that motivation does not facilitate decision latencies, other studies imply that motivation, partial motivation and even transparency may have a positive effect on lexical processing. For instance, according to McQueen and Cutler (1998: 407), words “may be recognized faster if a morphologically related word has recently been processed.” Formal and semantic lexical relatedness other than by a shared free base are also relevant in psycholinguistic processes: Taft and Kougious’ (2004: 9) experiments demonstrate “facilitation in a masked priming experiment for the semantically related pairs that share an initial orthographic subunit (e.g., virus-viral), but not for the semantically unrelated pairs (e.g., future-futile)”. They conclude that as “semantic relatedness alone did not produce priming (e.g., pursue-follow), it seems that priming arises only when there is a conjunction of form and meaning” (Taft and Kougious 2004: 9). 489 These findings are supported by Derwing (1976: 50- 51), who asked subjects whether they believed that particular words came from other words, e.g. teacher from teach or precious from price. His results show that both form and meaning play an important role. Morpheme recognition “is very highly related to semantic similarity (r = .79, p < .00001), 487 Other arguments why motivational darkness can be useful, e.g. in euphemisms and in the support of sociolinguistic identity, are mentioned in Ronneberger-Sibold (2001). 488 Furthermore, a low degree of lexical transparency can at least in part be compensated by syntagmatic transparency, i.e. by paraphrasing (cf. Fill 1980: 149-150 or Leisi 1999: 54). 489 However, this only seems to work if there is systematic sublexical structuring - in contrast to haphazard orthographic overlap in semantically related pairs such as screech/ scream or frost/ freeze (Taft and Kougious 2004: 13). Psycholinguistic perspective: the mental lexicon 267 but less so to phonetic similarity (r = .40, p < .002)” (Derwing 1976: 52). According to Stolz and Feldman (1995: 109), results indicate that “even when the components of a word are not easily decomposed either orthographically, phonologically, or semantically, skilled readers still demonstrate sensitivity to component morphological structure.” 490 Zwitserlood’s (1994) study offers evidence that (Dutch) compounds with one transparent constituent share properties of motivated compounds. According to Libben et al. (2003: 60), compounds with and without a transparent first element showed no difference in reaction time as long as the head was motivated. Consequently, the present study’s category RT, which comprises words containing a transparent remainder in addition to a motivated part, may have at least some psychological validity as far as the decomposability of the items is concerned. 491 It is even possible that the purely transparent words from the category UT also have an advantage in storage and processing over completely opaque words: Drews et al. (1994: 289) report the interesting result that German particle verbs such as abtreiben, which have a motivated and a transparent reading, 492 behaved like motivated particle verbs in a priming experiment. Schreuder, Burani and Baayen (2003: 167) conclude from their research that “the constituents of opaque words are not only identified, but their meaning becomes activated as well”. However, they are contradicted by Marslen- Wilson et al. (1994: 17), who find that “Phonetic overlap between primes and targets does not by itself produce priming”. As far as the relations between the truly morphosemantically motivated words are concerned, derivatives are not only connected to all the other lexical entries sharing the same root, but Clark (1993: 5) goes even further: There are also interconnections among lexical entries that contain the same derivational affixes (e.g., all the words with -er, with -tion, with -ity, or with -ness). These interconnections link lexical entries through meaning (for each affix), syntax (the resultant syntactic category of the derived word), and morphology. 493 Bybee (1998: 434) writes that the structural units below the word level “are not independent units, but rather emerge from these larger stored units via a network of connections among them.” This is indicated by the heavier 490 These results indicate that the words with the codes F and # for formal and semantic problems are rightly counted as motivatable. 491 Yet some of the words categorised as RT may have a transparent head and some other motivatable part. 492 The motivated meaning of abtreiben ‘drift away’ can be directly derived from its constituents ab- ‘away’ and treiben ‘drift’. However, even if the verb’s other meaning, ‘abort’, is less directly derivable from the prefix and the base, the approach adopted in the present study would not consider it transparent but rather partially motivatable due to a semantic obstacle. 493 Compounds should then also be connected through their roots. Consociation and dissociation in perspective 268 lines in Figure 14. 494 As actual tokens of use rather than smaller units such as bound morphemes seem to be stored in memory, affixes and “roots or stems have no separate representation, but exist only as relations of similarity among words” (Bybee 1998: 422). Figure 14: Sets of lexical connections yielding word-internal morphological structure In Bybee’s (1995: 428-429) model, words in the mental lexicon “are related to other words via sets of lexical connections between identical and similar phonological and semantic features.” While this means that morphosemantically related words have links among each other without being broken up into their constituent structure, the model also implies that unmotivated words are not isolated in the mental lexicon, but rather integrated into the lexical network by means of formal and/ or semantic similarity to other items. Summing up the empirical psycholinguistic research, some studies find an effect of motivation, whereas others can find no such effect. Consequently, the results are inconclusive, even if there seems to be more evidence that motivation and even transparency have a positive effect on lexical processing. The results of the present study support the hypothesis that English is a motivated rather than an unmotivated language, and that it therefore profits from the advantages that motivation offers. However, even a language with an extremely low degree of motivatability - if such a language exists at all - poses fewer problems for learners than one may initially expect because the exigencies of real-time conversation do not allow consideration of each word on its own for a long time: in normal discourse, motivatable 494 Source: Bybee (1995: 429). Psycholinguistic perspective: the mental lexicon 269 German Bahnhof ‘station’ does not work differently from a simplex and is a label just like opaque French gare (Scheidegger 1981: 95-96). 495 6.1.2.2 The effect of expandability De Saussure (1916/ 1960: 183) believes that all languages contain at least some relatively motivated words. It also follows from this that there should always be a certain number of expandable words. With regard to the mental lexicon, expandability should increase the number of links of the base word. Even if mental links to motivating constituents may be stronger than those to expanded forms because there are fewer alternatives, expandability should represent an advantage in terms of interconnection. This is supported by several experimental results. Schreuder and Baayen (1997: 121) “use the term morphological family to denote the set of words derived from a given stem by means of either compounding (tablespoon, timetable) or derivation (tablet, tabular)” and “refer to the number of different words in the morphological family (excluding from the count the base word itself) as the morphological family size and to the summed token frequencies of these words (now excluding the stem frequency of the base) as the cumulative family frequency” (Schreuder and Baayen 1997: 121). In their experiments with Dutch words, “the nouns with a high number of descendents [sic] were responded to more quickly than the nouns with a low number of descendents [sic]” (Schreuder and Baayen 1997: 125). 496 A difference in family size of 16 items was enough to improve reaction time by 40 ms, which suggests that the counting of types is more important than the number of tokens. Furthermore, “counts of family size and cumulative family frequency for English complex words (including compounds written without intervening spaces) […] reveal the same kind of correlational structure” (Schreuder and Baayen 1997: 135). 497 They consequently assume “that family size may also be an important factor in the lexical processing of English monomorphemic words” (Schreuder and Baayen 1997: 135). The prerequisite for priming is semantic relatedness, as 495 Cf. Bauer (1983: 44): “When a term is actually accepted by society, it becomes assimilated and used in exactly the same way as other lexemes of the language. The speaker-listener forgets why the word has the form that it does, and considers it merely as the appropriate label for the concept in question, even if the form is perfectly transparent.” 496 The descendants and their frequency are based on the 42-million-word CELEX lexical database (Schreuder and Baayen (1997: 120), but unfortunately, nothing is said about the authors’ criteria for the establishment of the word families. 497 By contrast, de Jong et al. (2002: 557) conclude from their experiments “that English open compounds do not belong to the morphological families of simplex words”. Hyphenated compounds, however, improved response times (de Jong et al. 2002: 566). Consociation and dissociation in perspective 270 transparent family members do not contribute to the family size effect (de Jong, Schreuder and Baayen 2003). Research by Feldman and Pastizzo (2003: 253) suggests that even when preceded by an only partially semantically motivated expansion, there is morphological facilitation - with words with a relatively small family showing more morphological facilitation than words with a large number of relatives. Feldman and Pastizzo’s (2003) research proves that partial expandability is a facilitating factor in processing and that words can be rightly classified as expandable even if there are only partial expansions. Combined with the other findings, it is possible to conclude that expandability is definitely a consociating factor, at least in a psycholinguistic perspective, and that a large number of expansions can make up for lack of motivation to some degree. 6.1.2.3 Conclusion Even though one must be careful not to transfer the results of a lexicological study one-to-one to the mental lexicon, prior work in the field of psycholinguistics suggests that the analyses carried out in the present research project may have some psychological validity - if only on a subconscious level. In this sense, this large-scale study complements psycholinguistic research that is based on only few example words 498 and finds that both motivatability and expandability of the vocabulary of a language may have a certain degree of influence on the acquisition, storage, comprehension and production of vocabulary items. Nonetheless, the effect of dissociation on the mental lexicon may be slightly less marked than has been believed so far. Due to their high resting activation levels, frequent forms should typically be retrieved via the direct route, regardless of whether they are analysable or not. This means that the most relevant part of the vocabulary should be retrieved identically in both consociated and dissociated languages. Be that as it may, the results of the present study indicate that English is not a dissociated but rather a consociated language, just like German. If English, which has been used as a classic example of a dissociated language, thus fails to be dissociated, one is tempted to ask whether there are any dissociated languages at all. 498 Unfortunately, the testing of motivation in those studies is frequently based on vocabulary items that are not ideally suited to the task. Thus, Dohmes, Zwitserlood and Bölte (2004: 204) consider butterfly opaque, even though butterflies fly. The landmark study by Marslen-Wilson et al. (1994: 11-12) considers responsible/ response unmotivated, even if one may see a relation here, whereas friendly/ friend is named as a model for high semantic relatedness in the instructions - in spite of the fact that one can be friendly without being someone’s friend. Didactic perspective: vocabulary learning and teaching 271 6.2 Didactic perspective: vocabulary learning and teaching Insofar as word-family integration can be applied in intentional vocabulary learning and teaching, the findings of the present study are of interest from a didactic perspective. Even if the author of the present study is a linguist rather than a language teaching methodologist, the interdisciplinary enterprise of applying the present study’s linguistic results to a neighbouring discipline is considered a necessity and is therefore attempted in all modesty. According to Wilkins (1972: 111), very little can be conveyed without grammar, but nothing can be conveyed without vocabulary. Gauger (1970: 46) stresses that in minimal forms of linguistic communication, vocabulary knowledge is more important than knowledge of all other elements. Considering that most learners do not aim at a quasi-native-like mastery of the language, but rather at a certain level of communicative competence, the importance of vocabulary acquisition becomes clear. Quite plausibly, then, the treatment of vocabulary should be given a very prominent place in language teaching. However, it seems that in the school context, grammar still holds a considerably more important position. There is psychological evidence that new phenomena - including words - can be remembered more easily if there is some relation to previous knowledge (cf. Kielhöfer 1994: 215 and Mietzel 2001: 190-191). In view of the extremely large part of the vocabulary that can be related to other words, even in the high-frequency ranges, making use of such inherent learning aids is therefore highly economic and efficient. This becomes of particular importance in the school context, where many words have to be learned in a relatively short period of time. According to Stork (2003: 11), this problem is often avoided in practice by turning vocabulary learning into homework that the pupils have to do by themselves. With this in mind, the teaching of learning strategies becomes particularly important. 499 As the present study deals with the most frequent English and German words, which are relevant for language learners, its results can represent a help in the choice of vocabulary learning strategies. 6.2.1 The effect of full motivatability 9.64% of the English words and 17.68% of the German items analysed in the present study belong to the category of full motivatability. If words 499 The teaching of learning strategies is considered in the German national educational standards for the intermediate level school leaving qualification (cf. the Bildungsstandards für die erste Fremdsprache (Englisch/ Französisch) für den Mittleren Schulabschluss at <http: / / www.kmk.org/ schul/ Bildungsstandards/ 1.Fremdsprache_MSA_BS_04-12- 2003.pdf>, p.18, 15.11.2006). Consociation and dissociation in perspective 272 containing self-elaborated affixes or a formal obstacle in either the spoken or the written variety are considered as belonging into this highest category as well, 13.48% of the English words and 17.68% of the German items should not pose any problems to learners acquiring the items in question. However, Weisgerber (1953: 173) points out that normal language users are relatively reluctant to recognise word families. Therefore, Hausmann (2002a: 262) demands that pupils should be made aware of word-formation structures; the learners need to build up what he calls a transparency competence. While good language learners develop this strategy by themselves, 500 the weaker learners in particular ought to be taught the strategy of decomposition, which is most easily exemplified on the basis of fully motivatable words. 6.2.2 The effect of partial motivatability Hausmann’s and Weisgerber’s statements are even more important with respect to partial motivatability because it pertains to a relatively large part of the vocabulary analysed here: 46.12% of the English and 48.84% of the German words can be related to motivating constituents while displaying some kind of obstacle. Consequently, pupils have to be told that a word such as French rapetisser ‘to reduce’ contains petit ‘small’ - otherwise, they will overlook such relations in the future (Hausmann 2002a: 262). The data in Section 4.1.6 and 4.2.5 illustrates that there are several types of obstacle that learners should be made aware of. Most of these apply to both English and German, but as there are some differences, it makes sense to give separate advice for the two languages. In English, almost 16% of the items are motivatable, but with formal constraints. In most of these cases, both spelling and pronunciation are concerned. Therefore, it is important that learners should be encouraged to seek similarities in spite of formal differences. 501 Some of these, such as the fact that the base of words ending in -ion is frequently modified, e.g. in produce/ production, decide/ decision, or changes between <f> and <v> in life/ live, belief/ believe, are fairly regular - a fact that can be easily taught in the classroom. 502 500 Cf. Lübke (1984: 378). Derwing, Smith and Wiebe (1995: 8) find that the skill in identifying the root morphemes of derived words depends on “educational experience, intelligence and even curiosity about language”. 501 According to Cutler (1981: 76), motivation in word formation does not require the whole base words’ intact preservation, but merely enough of it to ensure recognition - how much precisely will differ from word to word. 502 The change between <f> and <v> is already taught at school in the context of irregular plural formation (pointed out by Hans Rainer Fickenscher). Didactic perspective: vocabulary learning and teaching 273 With 15.80%, the proportion of words that are only motivatable by a grammatical polyseme is larger than that of the fully motivatable items. 503 This shows the necessity for awareness-raising in this domain, which is far from obvious for a linguistically inexperienced language user. In any case, learners ought to be made sensitive towards incomplete analysability - i.e. the fact that words are not completely understandable from their constituents and that unclear parts may remain. 504 One of the central results of the present study is the fact that 14.12% of the English words belong to this category. The largest proportion of these items is made up of words that are motivated by one - in very few cases more - affix only. Drawing on Leisi’s example for dissociation, oral is maybe not motivatable for many language users because there is no recognisable base. However, if learners are taught that prefixes and suffixes can give a clue to a word’s meaning, e.g. by indicating its part of speech, this may help them to comprehend words such as oral, which is likely to be an adjective - or a noun. Teaching the importance of affixes for word meaning recognition is particularly important for weaker learners: according to a study by Freyd and Baron (1982), average students, particularly young ones, rarely make use of derivational relations in order to learn new words, 505 whereas superior word learners are more able to use derivational relations (Freyd and Baron 1982: 291). Anglin’s (1993: 146) interviews with language learners also suggest that whole-word constituents are more salient than morphemes. He reports that across all grades in the literature reviewed, only the root word, but not the affixes were usually explicitly discussed by the children that were interviewed (Anglin 1993: 86). In the same vein, Augst (1975: 38) writes: 506 503 If all words that can be potentially motivated by a grammatical polyseme, even if they are already motivatable on a higher level, were taken into account as well, this should lead to a rise in figures, even if a brief look at the motivatable list words conveys the impression that motivatability by grammatical polysemy hardly ever occurs with otherwise motivatable items, and if so, then mainly in incompletely analysable items. 504 Hausmann (2005: 21) agrees that the French folk etymological constituent chou ‘cabbage’ in choucroute ‘sauerkraut’ should not be abandoned due to the differences between the remaining constituent and the word croûte ‘crust’. 505 Freyd and Baron (1982: 288) found that “there is a common type of error that involves treating a derived word as if the affix were simply absent”. In such cases, the part of speech of a derived word was given incorrectly, but would have been correct for the root of that word. “Subjects who have difficulty with derived words frequently deal with that difficulty by ignoring the affixes completely” (Freyd and Baron 1982: 288). 506 Augst (1975: 38): “For the language users, the difference between bound and free morphemes is crucial. The free morphemes are so dominant for them that they do not cut off the few bound lexemes. They even believe that all synsemantic morphemes are not worth mentioning” (my translation). Consociation and dissociation in perspective 274 Nun ist gerade für den Sprachteilhaber dieser Unterschied zwischen gebundenen und freien Morphemen sehr entscheidend. Die freien Morpheme dominieren für ihn derart, daß er die wenigen gebundenen Lexeme gar nicht abtrennt [...]. Alle gebundenen synsemantischen Morpheme hält er überhaupt nicht für erwähnenswert. Consequently, making learners aware of the importance of affixes may help them a lot in vocabulary learning. 507 Semantic obstacles occur in more than 13% of the 2,500 English list words. It is therefore important to encourage learners to look for lexical relations, even if this requires a certain degree of abstraction. This is even more important in the learning of German vocabulary: of the 2,500 words from the DWDS Core Corpus, more than 23% require some kind of semantic effort. Foreign learners of German must also be aware of the fact that formal obstacles play an even larger role in German than in English - even though the differences are relatively small: 16.60% of the German list words feature formal differences between the item and its constituents. The data also suggests that learners should be made aware of the fact that vowel variation is a relatively frequent kind of obstacle in German. Otherwise, helpful relations such as that between Band ‘band’ and binden ‘bind’ may pass unnoticed. As in English, it is also important to encourage the learners to look for supporting elements, even if there are changes in the form, or if certain parts of a word remain unanalysable. According to Hausmann (2005: 19), the relations between words should not be described but rather demonstrated to the learners, and word families such as French vaincre/ invincible/ victoire ‘defeat/ invincible/ victory’ or accéder/ accès/ accessible ‘access (v)/ access (n)/ accessible’ should be introduced as belonging together and learned in this way as well (Hausmann 2005: 23). 6.2.3 The effect of transparency Contrary to the analyst’s expectation, transparency turned out to be a relatively rare phenomenon in both English and German, so that its usefulness as a mnemonic device may be questioned. Only 4.40% of the BNC words and 5.32% of the DWDS words in the study could be analysed into pseudoconstituents. In addition, the pseudo-motivating elements, such as the 507 However, while Freyd and Baron’s (1982: 291) training of slow vocabulary learners led to inconclusive results, Sandra (1993: 290) finds that “the morpheme-based encoding technique is a particularly helpful technique for learning complex words but need not be taught to students, as they already spontaneously apply it”. However, “subjects who are told about the morpho-semantic properties of the complex words recall the stem of these words reliably better than control subjects if the morpho-semantic relationship is not too obvious and/ or young subjects are used” (Sandra 1993: 291). Didactic perspective: vocabulary learning and teaching 275 verbs sup and ply in supply, are usually relatively rare. 508 Consequently, most of the transparency analyses from the study’s word list could not be used as an aid in vocabulary teaching because the constituents are less frequent and less important than the analysis items, and because the learners cannot be expected to know those constituents. However, 168 English and 208 German words are motivatable but contain a transparent remainder, which corresponds to 6.72% and 8.32% of the list words respectively. It follows from this that there are more cases where transparent unmotivatable elements combine with truly motivating elements than words that are purely transparent but unmotivatable. Therefore, if transparency is used as a teaching device, the frequent combination with true constituents should be mentioned to the learners. 509 In view of the enormous memory achievement that the assimilation of hundreds of new sequences of sounds and letters demands, foreign language learners will be grateful for any kind of support (Sperber 1989: 116) - and transparency can represent such support. However, as transparency involves semantically wrong connections, the true morphosemantic relatedness of words should receive far more emphasis. The teaching of transparency in the classroom is only really of importance when learners are made aware of the fact that words can be deceptively transparent (Laufer 1997: 25-26) and thus misleading. Otherwise, a word such as discourse might be interpreted as ‘without direction’ by some learners. 510 508 This implies the additional danger of overlooking possible transparent constituents in the analyses. 509 Using pseudo-constituents is a strategy that is in some ways similar to the keyword method, in which one first has to choose a L1 or L2 word which is phonetically or orthographically similar to the L2 target word. Then, one has to create a strong association between the target word and the keyword, and construct a visual image that combines both referents, “preferably in a salient, odd, or bizarre fashion in order to increase its memorability” (Hulstijn 1997: 204). Thus, English learners of German can associate Raupe ‘caterpillar’ with rope and imagine a stretched-out caterpillar (Hulstijn 1997: 205). According to Hulstijn (1997: 210), the keyword technique is only rarely used in the instruction of foreign languages because many teachers do not find it “serious enough” and because it can only be successfully applied with a minority of vocabulary items. However, he points out that the key-word method does not replace established methods but is rather meant to complement them in the case of words that have proved of particular difficulty (Hulstijn 1994: 179 and 1997: 218). All this applies to the use of transparency as a mnemonic method as well. 510 Students were found to make significantly more mistakes in such words than in others. Furthermore, a comparison between the words students claimed they did not know and the ones they actually did not know showed that unawareness of ignorance was higher in transparent words as well (Laufer 1989: 14-15). Consociation and dissociation in perspective 276 6.2.4 The effect of expandability 98.88% of the English and 96.48% of the German words analysed here can be expanded into longer words, most of them even without any kind of obstacle. Consequently, Hausmann’s (2002a: 262) advice on learning words in motivated families is backed by the data. To give an English example, the word play should not be learned by itself, but in connection with player, playground etc. 511 If attention is drawn to formally and semantically related word families, learners are enabled to recognise parallels in vocabulary - a useful competence in lifelong learning. In addition, awareness of expandability may encourage learners to look at the motivating direction as well. 512 Denninghaus (1976: 7) points out that learners may enlarge their vocabulary in both directions because it is possible that they first encounter a complex word such as uncertain and can thus integrate the word’s base certain into their potential vocabulary, or that they first encounter certain and thus potentially know uncertain. However, it is important to keep in mind that the meaning of complex words could be expressed by different surface forms, but that there is usually a norm realisation. 513 Thus, German Meer and See can both be translated as sea, but while See is established in the compounds Seenot and Seegang, Meer is the norm in others such as Meeresspiegel and Meeresgrund (Wandruszka 1969: 38). It is therefore necessary to transmit to the learners that creativity in the domain of word formation is an advantage in communication, particularly if required words are unknown or irretrievable, while stressing at the same time that there is usually a norm realisation of complex items, which should preferably be learned. 6.2.5 The effect of a Romance origin Weis (1986: 178-179) tested to what degree English vocabulary is remembered by German pupils over a long period of time. Half of the words in 511 Of course, the choice of the word family members should be based on criteria such as usefulness and frequency. 512 Thus, Stein (2002: 133) suggests the expansion of Common Core Vocabulary items into word families as an exercise to be done “in the classroom, as a competitive word sport, or in the form of homework” because it will “increase the understanding of the lexical item at the centre of the family, the internalization of its meaning(s) and reduce the memorization effort.” 513 Hausmann (2004: 317) remarks that some - one may even want to argue all - established compounds can be interpreted as collocations. Cf. Seppänen (1978: 138): “Jedes Nominalkompositum, das einen gemeinschaftlich etablierten Wert bezeichnet, ist trotz seiner scheinbaren Analysierbarkeit zwangsläufig ein weitgehend arbiträres Sprachzeichen”, “Every nominal compound which refers to a commonly established value is inevitably a relatively arbitrary linguistic sign in spite of its seeming analysability” (my translation). This statement may also be expanded beyond the nominal compounds. Didactic perspective: vocabulary learning and teaching 277 the best-remembered group were of Germanic origin, but only 9% in the worst-remembered group. 514 Even though all of the pupils tested were learning French as well, it seems that vocabulary with a Romance origin caused the learners more difficulties. Considering that the majority not only of the English words but also the majority of the English affix types and tokens in the analysis items of the present study have a Romance background, the necessity of stressing consociating relations between English words, particularly for those of Romance origin, in the teaching of German learners becomes evident. Another interesting observation with regard to etymology is made by Gelfert (2003: 58): he finds that learners of English tend to use more hard words than native speakers because they frequently know them as foreign words in their own language and perceive them as far less difficult. Interlingual motivation may thus reduce the difficulty of Romance words for learners whose language(s) contain true cognates. In this sense, Gelfert (2003) and Weis (1986) present the Romance origin as an advantage and a disadvantage respectively. The conclusion to be drawn from both studies, though, is that language learners should be made aware of such interlingual phenomena. 515 6.2.6 Conclusion and application in the classroom The previous sections have shown that the results of the present study underline the importance of raising awareness of consociation in language learners. In both English and German language teaching, learners should be encouraged to become aware of the structures underlying the vocabulary items. Fully motivatable words provide a good starting point that allows generalisation to the partially motivatable items, which constitute the largest proportion of the high-frequency items. While transparency may represent a mnemonic aid, it is only advisable for words that pose particular problems and should above all be taught as a possible obstacle to comprehension. As class time is limited, Nation (1990: ix) distinguishes “between the small number of high-frequency words, which all deserve lots of attention, and the very large number of low-frequency words, which require the mastery of coping strategies”. While the low-frequency words themselves 514 It is highly regrettable that Weis does not include the list with the 129 test items because it would have been interesting to see what the Romance words in the worstremembered category were. Maybe a considerable proportion were of the oral type, i.e. consociated by an affix only. 515 Interlingual motivation plays an increasingly important role in language teaching: cf. Ogden (1946: 7), Macht and Steiner (1983: 100), Kemmeter (1999) and Gelfert (2003: 24). Consociation and dissociation in perspective 278 do not deserve teaching time, the strategies for dealing with them should take a more prominent place (Nation 1990: 19). After all, “it is precisely those few tokens of rare types that carry the highest information load in any text, and therefore cause the most hindrance in the reading process when unknown” (Arnaud and Savignon 1997: 158). Even if word length may represent a complicating factor, Laufer (1997: 145) believes that a word’s difficulty of learning must be attributed to a variety of factors. After all, the “learner’s first encounter with a short word bun can be more puzzling than with interdisciplinary, provided the separate morphemes of the latter are familiar” (Laufer 1997: 45). Consequently, it is recommendable to help learners overcome a possible initial fear of long words. According to Scherfer (1989: 8), pupils find nouns easier to learn than verbs, then adjectives, adverbs and function words. So far, function words have usually been neglected in studies on lexical motivation. However, the present research project was able to show that there is a considerable degree of motivatability in English pronouns (73.17%), prepositions (45.45%), conjunctions (39.29%) and determiners (37.21%). 516 Therefore, using full or partial motivatability in function words as aids to remembering those vocabulary items as well may represent a worthwhile tactic. While younger learners profit from the teacher’s guidance, Hulstijn (1994: 181) recommends that experienced learners should try their own strategies, first. Vocabulary-learning activities should be preceded by a communicative phase, in which the pupils experience that the lack of certain words is annoying (Aßbeck 1990: 43). Similarly, de Florio-Hansen (1996: 8) suggests giving groups of pupils different word lists with new vocabulary to learn silently for 10-15 minutes - one arranged in random order, the other one in terms of word families and word fields. When it is tested how much each group remembers, the pupils will notice that the second strategy is more effective. Burgschmidt (1978: 72-73) suggests exercises in which the pupils have to paraphrase word-formations as their underlying sentence, or to translate word-formations and talk about interference problems. On the basis of the present study’s results, morphological decomposition can be recommended as a highly effective vocabulary learning strategy. However, Nation (1990: 163) points out that guessing an unknown word’s meaning on the basis of its parts “is more likely to result in twisting the interpretation of the context than allowing interpretation of the context to modify the guess of the meaning.” 517 Therefore, he suggests a word- 516 Only the 12 modal verbs (16.67%) and the infinitive marker to are clearly unmotivatable. 517 The same phenomenon is observed by Laufer (1997: 26-27). Didactic perspective: vocabulary learning and teaching 279 guessing strategy involving motivation but consisting of different steps (Nation 1990: 162-163): 1. determining the part of speech of the word 2. determining the relation of the word to syntactically related words, e.g. that of nouns to adjectives or of verbs to nouns 3. determining the relation of the clause/ sentence in which the word is contained to others, e.g. cause, effect, contrast, inclusion, time, exemplification or summary 518 4. guessing the word’s meaning 5. checking whether the guessed word can be used as a substitute in the original sentence 6. only then breaking the unknown word into constituents if possible 7. checking if the constituents fit the guess. 519 As far as the Romance element in English and to a lesser degree in German is concerned, Stein (1974: 325) suggests compensating the decrease of Latin knowledge in pupils “by teaching word-formation patterns as well as the formal means including all those affixes derived from Latin” - an approach also suggested by Nation (1990: 169-171): he lists the most important Latin prefixes and suggests learning them by remembering well-known master words and the meaning of the affixes within. 518 This can be complemented by Laufer and Bensoussan’s (1982: 13) strategy, according to which learners are supposed to guess whether a word has positive or negative connotations. Thus, comprehension of the unknown word blurred in “That distinction has been blurred by imprecise use of the word language” is aided by the fact that the word imprecise suggests that blurred is viewed as negative. 519 An exercise with nonsense words whose meanings must be guessed to practise this strategy is proposed in Nation (1990: 247-248). Consociation and dissociation in perspective 280 6.3 Concluding remarks The present study has attempted to close a gap and to supply an empirical comparison of the English and the German language with regard to their degree of consociation. The potential motivatability and expandability of the 2,500 most frequent corpus-based English and German words were analysed on the basis of several criteria, the most important of which are 1. formal correspondence 2. semantic correspondence 3. completeness of the analysis. Overall, Leisi is right in claiming that the German vocabulary is more motivatable than the English vocabulary. However, the real-life differences are only minimal, and both languages come above the 50% level. Furthermore, English is slightly more expandable than German, but both languages approach a rate of 100%. If both directions of analysis are combined, English can also be said to be more consociated than German. However, a recalculation that takes into account the different degrees of consociation reverses the results. In both cases, though, the differences are only minimal. Considering all the evidence, the hypothesis that the English language tends to be dissociated in contrast to the considerably consociated German language is therefore untenable, at least as far as the highest-frequency vocabulary ranges are concerned. It would thus seem necessary to consider the empirical results of the present study in future academic teaching on the concept of dissociation. 7 Bibliography 7.1 Printed sources and CD-ROMs Adams, Valerie (2001): Complex Words in English. Harlow: Pearson. Aitchison, Jean (1994): Words in the Mind: An Introduction to the Mental Lexicon. Oxford: Blackwell. Albert, Ruth and Cor J. Koster (2002): Empirie in Linguistik und Sprachlehrforschung: Ein methodologisches Arbeitsbuch. Tübingen: Narr. Amtsblatt des Bayerischen Staatsministeriums für Unterricht, Kultus, Wissenschaft und Kunst: Lehrplan für das bayerische Gymnasium: Fachlehrplan für Englisch (27. Januar 1992). Munich: Jehle. Anglin, Jeremy M. (1993): Vocabulary Development: A Morphological Analysis. Chicago: Chicago UP. Arnaud, Pierre J. L. and Sandra J. Savignon (1997): “Rare Words, Complex Lexical Units and the Advanced Learner.” In: Coady, James and Thomas Huckin (Eds.) (1997): Second Language Vocabulary Acquisition: A Rationale for Pedagogy. Cambridge: CUP, 157-172. Aronoff, Mark and Frank Anshen (1998): “Morphology and the Lexicon: Lexicalization and Productivity.” In: Spencer, Andrew and Arnold M. Zwicky (Eds.) (1998): The Handbook of Morphology. Oxford: Blackwell, 237-247. Asher, Ronald E. (Ed.) (1994): The Encyclopedia of Language and Linguistics. 10 vols. Oxford: Pergamon. Aßbeck, Johann (1990): “Schüler können auch das Lernen lernen: Gedächtnispsychologie und Wortschatzarbeit in der Sekundarstufe II.” In: Der fremdsprachliche Unterricht 102 (1990), 41-46. Aston, Guy and Lou Burnard (1998): The BNC Handbook: Exploring the British National Corpus with SARA. Edinburgh: Edinburgh UP. Augst, Gerhard (1975): Lexikon zur Wortbildung: Morpheminventar. 3 vols. Tübingen: Narr. Augst, Gerhard (1998): Wortfamilienwörterbuch der deutschen Gegenwartssprache. Tübingen: Niemeyer. Augst, Gerhard, Andrea Bauer and Anette Stein (1977): Grundwortschatz und Ideolekt: Empirische Untersuchung zur semantischen und lexikalischen Struktur des kindlichen Wortschatzes. Tübingen: Niemeyer. Ayto, John (1990): Dictionary of Word Origins. London: Bloomsbury. Bally, Charles (1909/ 1951): Traité de stylistique française. 3rd ed. Paris: Klincksieck. Bally, Charles (1944): Linguistique générale et linguistique française. 2nd ed. Bern: Francke. Bally, Charles (1965): Linguistique générale et linguistique française. 4th rev. ed. Bern: Francke. Bammesberger, Alfred and Joachim Grzega (1999): Repetitorium zur englischen Sprachwissenschaft. Heidelberg: Winter. Barnickel, Klaus-Dieter (2000): Sprachwissenschaft im nicht vertieften Examen: Lösungsvorschläge. 2nd, rev. ed. Erlangen: Institut für Anglistik und Amerikanistik. Bibliography 282 Barz, Irmhild (2000): Review of Gerhard Augst’s Wortfamilienwörterbuch der deutschen Gegenwartssprache. In: Deutsch als Fremdsprache 37 (2000), 49-51. Bauer, Laurie (1983): English Word-Formation. Cambridge: CUP. Bauer, Laurie (1998): “When is a Sequence of Two Nouns a Compound in English? ” In: English Language and Linguistics 2 (1998), 65-86. Bauer, Laurie (2000): “System vs. Norm: Coinage and Institutionalization.” In: Booij, Geert, Christian Lehmann and Joachim Mugdan (Eds.) (2000): Morphologie: Ein internationales Handbuch zur Flexion und Wortbildung. Berlin: de Gruyter, 832-840. Baugh, Albert C. and Thomas Cable (1935/ 1993): A History of the English Language. 4th ed. London: Routledge. Becker, Hellmut (1983): “Zensuren als Lebenslüge und Notwendigkeit.” In: Becker, Hellmut and Hartmut von Hentig (Eds.) (1983): Zensuren: Lüge - Notwendigkeit - Alternativen. Frankfurt/ Main: Klett-Cotta, 11-32. Beersmans, Franz and Guust Meijers (1999): Review of Gerhard Augst’s Wortfamilienwörterbuch der deutschen Gegenwartssprache. In: Leuvense Bijdragen 88 (1999), 505-508. Béjoint, Henri (1999): “Compound Nouns in Learners’ Dictionaries.” In: Herbst, Thomas and Kerstin Popp (Eds.) (1999): The Perfect Learners’ Dictionary (? ). Tübingen: Niemeyer, 81-99. Berko, Jean (1958): “The Child’s Learning of English Morphology.” In: Word 14 (1958), 150-177. Blumenthal, Peter (1997): Sprachvergleich Deutsch - Französisch. 2nd, rev. ed. Tübingen: Niemeyer. Bollée, Annegret (1995/ 96): Französische Wortbildung. University of Bamberg: Lecture notes. Bollée, Annegret (1997/ 98): Französische Semantik. University of Bamberg: Lecture notes. Braun, Peter (1979): “Fremdwörter als Internationalismen: Ein Beitrag zur interlinguistischen Behandlung von Fremdwortfragen.” In: Braun, Peter (Ed.) (1979): Fremdwort-Diskussion. Munich: Fink, 95-103. Bright, William (Ed.) (1992): An International Encyclopedia of Linguistics. 4 vols. Oxford: OUP. Burgschmidt, Ernst (1976): Sprachwissenschaftliche Termini für Anglisten: Deskriptive Linguistik, Angewandte Linguistik, Historische Linguistik, Fachdidaktik, Textlinguistik. Nürnberg: Burgschmidt. Burgschmidt, Ernst (1978): Wortbildung im Englischen. Dortmund: Lensing. Bußmann, Hadumod (2002): Lexikon der Sprachwissenschaft. 3rd, rev. ed. Stuttgart: Kröner. Butterworth, Brian (1983): “Lexical Representation.” In: Butterworth, Brian (Ed.) (1983): Language Production. Vol. 2. London: Academic Press, 257-294. Bybee, Joan (1995): “Regular Morphology and the Lexicon.” In: Language and Cognitive Processes 10 (1995), 425-455. Bybee, Joan (1998): “The Emergent Lexicon.” In: Chicago Linguistic Society 34 (1998), 421-435. Carroll, Lewis (1871/ 1970: ): “Jabberwocky.” In: Carroll, Lewis (1970): The Annotated Alice: Alice’s Adventures in Wonderland and Through the Looking-Glass. London: Penguin, 191-197. Cassell’s English and German Dictionary (1968). 12th ed. London: Cassell. Cassell’s German and English Dictionary (1968). 12th ed. London: Cassell. Printed sources and CD-ROMs 283 Chomsky, Noam (1965): Aspects of the Theory of Syntax. Cambridge, Massachusetts: MIT Press. Clark, Eve (1993): The Lexicon in Acquisition. Cambridge: CUP. Coates, William Ames (1964): “Meaning in Morphemes and Compound Lexical Units.” In: Lunt, Horace G. (Ed.) (1962): Proceedings of the Ninth International Congress of Linguists in Cambridge, Mass., 1962. The Hague: Mouton, 1046-1051. Conrad, Rudi (Ed.) (1985): Lexikon sprachwissenschaftlicher Termini. Leipzig: Bibliographisches Institut. Coseriu, Eugenio (1968): “L’arbitraire du signe: Zur Spätgeschichte eines aristotelischen Begriffes.” In: Archiv für das Studium der neueren Sprachen und Literaturen 204 (1968), 81-112. Coseriu, Eugenio (1978): Probleme der strukturellen Semantik. Tübingen: Narr. Cruse, David A. (1986): Lexical Semantics. Cambridge: CUP. Crystal, David (1980): A First Dictionary of Linguistics and Phonetics. London: Deutsch. Crystal, David (1995): The Cambridge Encyclopedia of the English Language. Cambridge: CUP. Crystal, David (1997): The Cambridge Encyclopedia of Language. 2nd ed. Cambridge: CUP. Cutler, Anne (1981): “Degrees of Transparency in Word Formation.” In: Canadian Journal of Linguistics 26 (1981), 73-77. Denninghaus, Friedhelm (1976): “Der kontrollierte Erwerb eines potentiellen Wortschatzes im Fremdsprachenunterricht.” In: Praxis des neusprachlichen Unterrichts 23 (1976), 3-14. Derwing, Bruce L. (1976): “Morpheme Recognition and the Learning of Rules for Derivational Morphology.” In: Canadian Journal of Linguistics 21 (1976), 38-66. Derwing, Bruce L., Martha L. Smith and Grace E. Wiebe (1995): “On the Role of Spelling in Morpheme Recognition: Experimental Studies with Children and Adults.” In: Feldman, Laurie Beth (Ed.) (1995): Morphological Aspects of Language Processing. Hillsdale, N. J.: Erlbaum, 3-27. Die deutsche Rechtschreibung (2006). 24th rev. ed. Mannheim: Duden. Dohmes, Petra, Pienie Zwitserlood and Jens Bölte (2004): “The Impact of Semantic Transparency of Morphologically Complex Words on Picture Naming.” In: Brain and Language 90 (2004), 203-212. Drews, Etta et al. (1994): “Lexikalische Repräsentation morphologischer Strukturen.” In: Felix, Sascha W., Christopher Habel and Gert Rickheit (Eds.) (1994): Kognitive Linguistik: Repräsentationen und Prozesse. Opladen: Westdeutscher Verlag, 273-298. Duden - Das große Wörterbuch der deutschen Sprache (2000). CD-ROM. Mannheim: Duden, at <http: / / www.bibliothek.uni-regensburg.de/ dbinfo/ einzeln.phtml? bib_id=ub_aandcolors=63andocolors=40andtitel_id=1558>. Duden Deutsches Universalwörterbuch (2003). 5th, rev. ed. on CD-ROM. Mannheim: Bibliographisches Institut and F. A. Brockhaus. Eckstein, Doris (2004): Unbewusste Wortwahrnehmung. Münster: Waxmann. Ellis, Nick C. and Alan Beaton (1993): “Psycholinguistic Determinants of Foreign Language Vocabulary Learning.” In: Language Learning 43 (1993), 559-617. Empson, William (1930/ 1965): Seven Types of Ambiguity. Harmondsworth: Penguin. Empson, William (1951): The Structure of Complex Words. London: Chatto and Windhus. Bibliography 284 Englische Sprachwissenschaft im schriftlichen Staatsexamen: Eine Orientierungshilfe (2003). Erlangen: Institut für Anglistik und Amerikanistik. Erades, Peter A. (1956-1957): Review of Ernst Leisi’s Das heutige Englisch. In: Lingua: International Review of General Linguistics 6 (1956-1957), 210-213. Erben, Johannes (1964): “Deutsche Wortbildung in synchronischer und diachronischer Sicht.” In: Wirkendes Wort 14 (1964), 83-93. Erk, Heinrich (1985): Wortfamilien in wissenschaftlichen Texten. Ein Häufigkeitsindex. Munich: Hueber. Evert, Stefan et al. (2004): “Supporting Corpus-Based Dictionary Updating.” In: Williams, Geoffrey and Sandra Vessier (Eds.) (2004): Proceedings of the Eleventh EURALEX International Congress. Lorient: Université de Bretagne-Sud, 255-264. Feldman, Laurie Beth and Michal Raveh (2002): “When Degree of Semantic Similarity Influences Morphological Processing: Cross Language and Cross Task Comparisons.” In: Shimron, Joseph (Ed.) (2002): Language Processing and Language Acquisition in Languages with Root-Based Morphology. Amsterdam: John Benjamins, 187-200. Feldman, Laurie Beth and Matthew John Pastizzo (2003): “Morphological Facilitation: The Role of Semantic Transparency and Family Size.” In: Baayen, Harald R. and Robert Schreuder (Eds.) (2003): Morphological Structure in Language Processing. Berlin: de Gruyter, 233-258. Fill, Alwin (1976): “Synchrone oder diachrone etymologische Kompetenz.” In: Klagenfurter Beiträge zur Sprachwissenschaft 2 (1976), 3-16. Fill, Alwin (1980): Wortdurchsichtigkeit im Englischen: Eine nicht-generative Studie morphosemantischer Strukturen: Mit einer kontrastiven Untersuchung der Rolle durchsichtiger Wörter im Englischen und Deutschen der Gegenwart. Innsbruck: Innsbrucker Beiträge zur Sprachwissenschaft. Fill, Alwin (1988): “Purism and Word Formation: Word Substitution in 16th and 19th Century English.” In: Markus, Manfred (Ed.) (1988): Historical English: On the Occasion of Karl Brunner’s 100th Birthday. Innsbruck: Innsbrucker Beiträge zur Kulturwissenschaft, 231-244. Finkenstaedt, Thomas, Ernst Leisi and Dieter Wolff (1970): A Chronological English Dictionary: Listing 80 000 Words in Order of their Earliest Known Occurrence. Heidelberg: Winter. Finkenstaedt, Thomas and Dieter Wolff (1973): Ordered Profusion: Studies in Dictionaries and the English Lexicon. Heidelberg: Winter. Firth, John Rupert (1964): The Tongues of Men and Speech. London: OUP. Fleischer, Wolfgang and Irmhild Barz (1995): Wortbildung der deutschen Gegenwartssprache. 2nd ed. Tübingen: Niemeyer. de Florio-Hansen, Inez (1996): “Lernen, wie man Wortschatz lernt: Von der Instruktion zur Lernerautonomie.” In: Der fremdsprachliche Unterricht Französisch 3 (1996), 4-11. Frauenfelder, Uli H. and Robert Schreuder (1991): “Constraining Psycholinguistic Models of Morphological Processing and Representation: The Role of Productivity.” In: Booij, Geert and Jaap van Marle (Eds.) (1991): Yearbook of Morphology. Dordrecht: Kluwer, 165-183. Freyd, Pamela and Jonathan Baron (1982): “Individual Differences in Acquisition of Derivational Morphology.” In: Journal of Verbal Learning and Verbal Behavior 21 (1982), 282-295. Printed sources and CD-ROMs 285 Frishberg, Nancy (1975): “Arbitrariness and Iconicity in American Sign Language.” In: Language 51 (1975), 696-719. Fuhrhop, Nanna (2000): “Zeigen Fugenelemente die Morphologisierung von Komposita an? ” In: Thieroff, Rolf et al. (Eds.) (2000): Deutsche Grammatik in Theorie und Praxis. Tübingen: Niemeyer, 201-213. Gauger, Hans-Martin (1970): Wort und Sprache: Sprachwissenschaftliche Grundfragen. Tübingen: Niemeyer. Gauger, Hans-Martin (1971): Durchsichtige Wörter: Zur Theorie der Wortbildung. Heidelberg: Winter. Gelfert, Hans-Dieter (2003): Englisch mit Aha! Die etwas andere Einführung in die englische Sprache. Munich: C. H. Beck. Geyken, Alexander and Thomas Hanneforth (2006): “TAGH: A Complete Morphology for German Based on Weighted Finite State Automata - Draft.” In: Yli-Jyrä, Anssi, Lauri Karttunen and Juhani Karhumäki (Eds.) (2006): Finite-State Methods and Natural Language Processing: 5th International Workshop, FSMNLP 2005, Helsinki, Finland, September 1-2, 2005, Revised Papers. Berlin: Springer, 55-66. Gilliéron, Jules (1922): Les étymologies des étymologistes et celles du peuple. Paris: Champion. Glück, Helmut (Ed.) (2000): Metzler Lexikon Sprache. 2nd, rev. ed. on CD-ROM. Stuttgart: Metzler. Gneuss, Helmut (1955): Lehnbildungen und Lehnbedeutungen im Altenglischen. Berlin: Schmidt. Goethe, Wolfgang von (1899): Italienische Reise. Leipzig: Freytag. Görlach, Manfred (1974): Einführung in die englische Sprachgeschichte. Heidelberg: Quelle und Meyer. Görlach, Manfred (1986): “Middle English - A Creole? ” In: Kastovsky, Dieter and Aleksander Szwedek (Eds.) (1986): Linguistics across Historical and Geographical Boundaries. Vol. 1. Berlin: de Gruyter, 329-344. Götz, Dieter (1971): Studien zu den verdunkelten Komposita im Englischen. Nuremberg: Hans Carl. Götz, Dieter (1999): “On Some Differences between English and German (with Respect to Lexicography).” In: Herbst, Thomas and Kerstin Popp (Eds.) (1999): The Perfect Learners’ Dictionary (? ). Tübingen: Niemeyer, 221-228. Gougenheim, Georges et al. (1967): L’élaboration du français fondamental (1 er degré): Etude sur l’établissement d’un vocabulaire et d’une grammaire de base. Paris: Didier. Greenbaum, Sidney (1988): Good English and the Grammarian. London: Longman. Grove, Victor (1949): The Language Bar. London: Routledge and Kegan Paul. Handke, Jürgen (1994): “Zugriffsmechanismen im mentalen und maschinellen Lexikon.” In: Börner, Wolfgang and Klaus Vogel (Eds.) (1994): Kognitive Linguistik und Fremdsprachenerwerb: Das mentale Lexikon. Tübingen: Narr, 89-105. Hansen, Klaus (1978): “Problems in the Semantic Analysis of Compounds.” In: Zeitschrift für Anglistik und Amerikanistik 26 (1978), 247-251. Hausmann, Franz Josef (2002a): “Nur nützliche Wörter lernen! Durchsichtigkeit des Wortschatzes und Optimierung der Wortschatzarbeit.” In: Französisch Heute 2 (2002), 256-269. Hausmann, Franz Josef (2002b): “La transparence et l’obstacle: Essai de chrestolexicographie.” In: Etudes de linguistique appliquée 128 (2002), 447-454. Bibliography 286 Hausmann, Franz Josef (2004): “Was sind eigentlich Kollokationen? ” In: Steyer, Kathrin (Ed.) (2004): Wortverbindungen - mehr oder weniger fest. Berlin: de Gruyter, 309-334. Hausmann, Franz Josef (2005): Der undurchsichtige Wortschatz des Französischen: Lernwortlisten für Schule und Studium. Aachen: Shaker. Hausmann, Franz Josef and Herbert Ernst Wiegand (1989): “Component Parts and Structures of General Monolingual Dictionaries: A Survey.” In: Hausmann, Franz Josef et al. (Eds.) (1989): Wörterbücher. Dictionaries. Dictionnaires: Ein internationales Handbuch zur Lexikographie. Vol. 1. Berlin: de Gruyter, 328-360. Hausser, Roland (1998): “Häufigkeitsverteilung deutscher Morpheme.” In: LDV- Forum 15 (1998), 6-28. Heid, Ulrich, et al. (2004): “Tools for Upgrading Printed Dictionaries by Means of Corpus-Based Lexical Acquisition.” In: Proceedings of the 4th International Conference on Language Resources and Evaluation (LREC-2004). Lisbon, Portugal, 419- 422. Henderson, Leslie (1985): “Toward a Psychology of Morphemes.” In: Ellis, Andrew W. (Ed.) (1985): Progress in the Psychology of Language. Vol. 1. London: Erlbaum, 15-72. Herbst, Thomas, Rita Stoll and Rudolf Westermayr (1991): Terminologie der Sprachbeschreibung: Ein Lernwörterbuch für das Anglistikstudium. Ismaning: Hueber. Herbst, Thomas and Michael Klotz (2003): Lexikografie. Paderborn: Schöningh. Heringer, Hans Jürgen (1984): “Gebt endlich die Wortbildung frei! ” In: Sprache und Literatur in Wissenschaft und Unterricht 15 (1984), 43-53. Hickey, Raymond (2000): “Dissociation as a Form of Language Change.” In: European Journal of English Studies 4 (2000), 303-315. Hillebrand, Ulrich (1975): Chronologische und etymologische Untersuchungen zum französischen Wortbestand innerhalb der englischen Sprache. Münster: Doctoral thesis. Hoeksema, Jacob (2000): “Compositionality of Meaning.” In: Booij, Geert, Christian Lehmann and Joachim Mugdan (Eds.) (2000): Morphologie: Ein internationales Handbuch zur Flexion und Wortbildung. Berlin: de Gruyter, 851-857. Hughes, Geoffrey (1988): Words in Time: A Social History of the English Vocabulary. Oxford: Blackwell. Hulstijn, Jan H. (1994): “Die Schlüsselwortmethode: Ein Weg zum Aufbau des Lernerlexikons in der Fremdsprache.” In: Börner, Wolfgang and Klaus Vogel (Eds.) (1994): Kognitive Linguistik und Fremdsprachenerwerb: Das mentale Lexikon. Tübingen: Narr, 169-183. Hulstijn, Jan H. (1997): “Mnemonic Methods in Foreign Language Vocabulary Learning: Theoretical Considerations and Pedagogical Implications.” In: Coady, James and Thomas Huckin (Eds.) (1997): Second Language Vocabulary Acquisition: A Rationale for Pedagogy. Cambridge: CUP, 203-224. Hunnius, Klaus (1983): Review of Jean Scheidegger’s Arbitraire et motivation en français et en allemand. In: Zeitschrift für romanische Philologie 99 (1983), 212-214. Ickler, Theodor (1999): “Spekulative Volkslinguistik: Anläßlich des Erscheinens von: Gerhard Augst: Wortfamilienwörterbuch der deutschen Gegenwartssprache.” In: Zeitschrift für Dialektologie und Linguistik 66 (1999), 296-307. Ingenkamp, Karlheinz (1997): Lehrbuch der Pädagogischen Diagnostik. 4th, rev. ed. Weinheim: Beltz. Jackendoff, Ray (2002): Foundations of Language: Brain, Meaning, Grammar, Evolution. Oxford: OUP. Printed sources and CD-ROMs 287 Jespersen, Otto (1905/ 1982): Growth and Structure of the English Language. 10th ed. Oxford: Blackwell. de Jong, Nivja et al. (2002): “The Processing and Representation of Dutch and English Compounds: Peripheral Morphological and Central Orthographic Effects.” In: Brain and Language 81 (2002), 555-567. de Jong, Nivja, Robert Schreuder and R. Harald Baayen (2003): “Morphological Resonance in the Mental Lexicon.” In: Baayen, Harald R. and Robert Schreuder (Eds.) (2003): Morphological Structure in Language Processing. Berlin: de Gruyter, 65-88. Jurafsky, Daniel (1996): “A Probabilistic Model of Lexical and Syntactic Access and Disambiguation.” In: Cognitive Science 20 (1996), 137-194. Justice, David (1982): Review of Jean Scheidegger’s Arbitraire et motivation en français et en allemand. In: Romance Philology 36 (1982), 259-263. Käge, Otmar (1980): Motivation: Probleme des persuasiven Sprachgebrauchs, der Metapher und des Wortspiels. Göppingen: Kümmerle. Kandler, Günther and Stefan Winter (1992): Wortanalytisches Wörterbuch: Deutscher Wortschatz nach Sinn-Elementen. 10 vols. Munich: Fink. Kanngießer, Siegried (1985): “Strukturen der Wortbildung.” In: Schwarze, Christoph and Dieter Wunderlich (Eds.) (1985): Handbuch der Lexikologie. Königstein: Athenäum, 134-183. Käsmann, Hans (1975): Review of Thomas Finkenstaedt and Dieter Wolff’s Ordered Profusion. In: Anglia: Zeitschrift für englische Philologie 93 (1975), 470-477. Kastovsky, Dieter (1982): Wortbildung und Semantik. Düsseldorf: Schwann-Bagel. Keller, Rudi (1994): Sprachwandel: Von der unsichtbaren Hand in der Sprache. 2nd, rev. ed. Tübingen: Francke. Kemmeter, Luise (1999): Multilingual gestütztes Vokabellernen im gymnasialen Englischunterricht. Frankfurt/ Main: Lang. Kempcke, Günter et al. (1984): Handwörterbuch der deutschen Gegenwartssprache. 2 vols. Berlin: Akademie-Verlag. Kielhöfer, Bernd (1994): “Wörter lernen, behalten und erinnern.” In: Neusprachliche Mitteilungen aus Wissenschaft und Praxis 47 (1994), 211-220. Kilgarriff, Adam (1997): “Putting Frequencies in the Dictionary.” In: International Journal of Lexicography 10 (1997), 135-155. Kluge, Friedrich (2002): Etymologisches Wörterbuch der deutschen Sprache. 24th, rev. ed. on CD-ROM. Berlin: de Gruyter. Koch, Peter and Daniela Marzo (2007): “A Two-Dimensional Approach to the Study of Motivation in Lexical Typology and its First Application to French High- Frequency Vocabulary”. In: Studies in Language 31 (2007), 259-291. Koziol, Herbert (1937): Handbuch der englischen Wortbildungslehre. Heidelberg: Winter. Laca, Brenda (1986): Die Wortbildung als Grammatik des Wortschatzes: Untersuchungen zur spanischen Subjektnominalisierung. Tübingen: Narr. Lamb, Sydney M. (2001): “Learning Syntax - A Neurocognitive Approach.” In: Pütz, Martin, Susanne Niemeyer and René Dirven (Eds.) (2001): Applied Cognitive Linguistics. Vol. 1. Berlin: de Gruyter, 167-191. Landau, Sidney I. (1996): Dictionaries: The Art and Craft of Lexicography. Cambridge: CUP. Langenscheidt e-Großwörterbuch Deutsch als Fremdsprache (2003). Version 4.0. Munich: Langenscheidt. Bibliography 288 Laufer, Batia (1989): “A Factor of Difficulty in Vocabulary Learning: Deceptive Transparency.” In: AILA Review 6, 10-20. Laufer, Batia (1997): “What’s in a Word that Makes it Hard or Easy: Some Intralexical Factors that Affect the Learning of Words.” In: Schmitt, Norbert and Michael McCarthy (Eds.) (1997): Vocabulary: Description, Acquisition and Pedagogy. Cambridge: CUP, 140-156. Laufer, Batia (1998): “The Development of Passive and Active Vocabulary in a Second Language: Same or Different? ” In: Applied Linguistics 19 (1998), 255-271. Laufer, Batia and Marsha Bensoussan (1982): “Meaning Is in the Eye of the Beholder.” In: English Teaching Forum 20 (1982), 10-14. Le Nouveau Petit Robert (2007). Paris: Le Robert. Leech, Geoffrey (1981): Semantics: The Study of Meaning. 2nd ed. London: Penguin. Leisi, Ernst (1953): “The Problem of the ‘Hard Words’.” In: English Studies 34 (1953), 262-267. Leisi, Ernst (1955): Das heutige Englisch: Wesenszüge und Probleme. Heidelberg: Winter. Leisi, Ernst (1960): Das heutige Englisch: Wesenszüge und Probleme. 2nd ed. Heidelberg: Winter. Leisi, Ernst (1961): “Deutsch und Englisch: Ein Vergleich zwischen zwei Sprachen.” In: Muttersprache 71 (1961), 257-264. Leisi, Ernst (1981): “Traditionelle Linguistik.” In: Die neueren Sprachen 80 (1981), 378- 390. Leisi, Ernst (1985): Praxis der englischen Semantik. 2nd, rev. ed. Heidelberg: Winter. Leisi, Ernst and Christian Mair (1999): Das heutige Englisch: Wesenszüge und Probleme. 8th, rev. ed. Heidelberg: Winter. Levi, Judith N. (1978): The Syntax and Semantics of Complex Nominals. New York: Academic Press. Lewandowski, Theodor (1973): Linguistisches Wörterbuch. 3 vols. Heidelberg: Quelle and Meyer. Libben, Gary et al. (2003): “Compound Fracture: The Role of Semantic Transparency and Morphological Headedness.” In: Brain and Language 84 (2003), 26-43. von Lindheim, Bogislav (1956): Review of Ernst Leisi’s Das heutige Englisch. In: Anglia: Zeitschrift für englische Philologie 74 (1956), 271-274. Lipka, Leonhard (1992): An Outline of English Lexicology: Lexical Structure, Word Semantics, and Word-Formation. 2nd ed. Tübingen: Niemeyer. Lipka, Leonhard (2002): English Lexicology. Tübingen: Narr. Ljung, Magnus (1974): A Frequency Dictionary of English Morphemes. Stockholm: AWE/ Gebers. Longman Dictionary of Contemporary English (2005). 4th ed. with Writing Assistant. Harlow: Pearson. Longman Dictionary of Contemporary English (2005). 4th ed. with Writing Assistant. CD-ROM. London: Longman. Lu, Angela Yi-chün (1998): Phonetic Motivation: A Study on the Relationship between Form and Meaning. Munich: Hieronymus. Lübke, Diethard (1984): “Der potentielle Wortschatz im Französischen.” In: Praxis des neusprachlichen Unterrichts 31 (1984), 372-379. Macht, Konrad and Friedrich Steiner (1983): Erfolgsfaktoren des Vokabellernens: Untersuchungen zum aktiven englischen Wortschatz von Hauptschulabgängern. Augsburg: Universität Augsburg. Printed sources and CD-ROMs 289 Malmkjaer, Kirsten (1991): The Linguistics Encyclopedia. London: Routledge. Mander, Gabrielle (2000): Ltle bk of txt msgs. London: O’Mara. Marchand, Hans (1960): The Categories and Types of Present-Day English Word- Formation: A Synchronic-Diachronic Approach. Wiesbaden: Harrassowitz. Marslen-Wilson, William D. et al. (1994): “Morphology and Meaning in the English Mental Lexicon.” In: Psychological Review 101 (1994), 3-33. Marzo, Daniela and Verena Rube (2005): “What Do you Think where Words Come from? Investigating Lexical Motivation Empirically.” In: Solovyev, Valery, Vera Goldberg and Vladimir Polyakov (Eds.) (2005): The VIII-th International Conference “Cognitive Modeling in Linguistics”. Proceedings. Vol. 1. Kazan: Kazan State University, 152-161. Matthews, Peter (1974): Morphology: An Introduction to the Theory of Word-Structure. Cambridge: CUP. Mauger, Gaston and Georges Gougenheim (1963): Le français élémentaire: Méthode progressive de français usuel. Débutants. Vol. 2. Paris: Hachette. Mauger, Gaston and Georges Gougenheim (1964): Le français élémentaire: Méthode progressive de français usuel. Débutants. Vol. 1. Paris: Hachette. Mayer, Erwin (1962): Sekundäre Motivation: Untersuchungen zur Volksetymologie und verwandten Erscheinungen im Englischen. Cologne: Doctoral thesis. McQueen, James M. and Anne Cutler (1998): “Morphology in Word Recognition.” In: Spencer, Andrew and Arnold M. Zwicky (Eds.) (1998): The Handbook of Morphology. Oxford: Blackwell, 406-427. Meara, Paul M. (1982): “Vocabulary Acquisition: A Neglected Aspect of Language Learning.” In: Kinsella, Valerie (Ed.) (1980): Language Teaching Surveys I. Cambridge: CUP, 100-126. Meara, Paul M. (1983): Vocabulary in a Second Language. Vol. 1. London: Centre for Information on Language Teaching and Research. Meara, Paul M. (1987): Vocabulary in a Second Language. Vol. 2. London: Centre for Information on Language Teaching and Research. Mietzel, Gerd (2001): Pädagogische Psychologie des Lernens und Lehrens. 6th, rev. ed. Göttingen: Hogrefe. Mitterand, H. (1968): Les mots français. Paris: Presses Universitaires de France. Moore, David S. (1997): Statistics: Concepts and Controversies. New York: Freeman. Motsch, Wolfgang (1995): “Semantische Grundlagen der Wortbildung.” In: Harras, Gisela (Ed.) (1995): Die Ordnung der Wörter: Kognitive und lexikalische Strukturen. Berlin: de Gruyter, 193-226. Motsch, Wolfgang (1999): Deutsche Wortbildung in Grundzügen. Berlin: de Gruyter. Munske, Horst Haider (1983): “Zur Fremdheit und Vertrautheit der ‘Fremdwörter’ im Deutschen: Eine interferenzlinguistische Skizze.” In: Peschel, Dietmar (Ed.) (1983): Germanistik in Erlangen: Hundert Jahre nach der Gründung des deutschen Seminars. Erlangen: Universitätsbund Erlangen-Nürnberg, 559-593. Nation, Paul (1990): Teaching and Learning Vocabulary. New York: Newbury House. Nation, Paul (2004): “A Study of the Most Frequent Word Families in the British National Corpus.” In: Bogaards, Paul and Batia Laufer (Eds.) (2004): Vocabulary in a Second Language. Amsterdam: John Benjamins, 3-13. Oehler, H. (Ed.) (1966): Grundwortschatz Deutsch. Stuttgart: Klett. Ogden, Charles Kay (1946): Basic English: Englisch mit 850 Wörtern. Heidelberg: Winter. Bibliography 290 Onions, Charles Talbot (1966): The Oxford Dictionary of English Etymology. Oxford: Clarendon. Oxford Advanced Learner’s Compass (2005). CD-ROM. Oxford: OUP. Oxford Advanced Learner’s Dictionary of Current English (2005). 7th ed. Oxford: OUP. Oxford English Dictionary on CD-ROM (1994). 2nd ed. Version 1.10. Oxford: OUP. Paul, Hermann (2002): Deutsches Wörterbuch: Bedeutungsgeschichte und Aufbau unseres Wortschatzes. 10th, rev. ed. Tübingen: Niemeyer. Pei, Mario (1962): The Families of Words. New York: Harper. Pinker, Steven (1995): The Language Instinct. New York: HarperCollins. Plag, Ingo (2003): Word-Formation in English. Cambridge: CUP. von Polenz, Peter (1967): “Fremdwort und Lehnwort sprachwissenschaftlich betrachtet.” In: Muttersprache 77 (1967), 65-80. Püschel, Ulrich (1978): “Wortbildung und Idiomatik.” In: Zeitschrift für germanistische Linguistik 6 (1978), 151-167. Quirk, Randolph et al. (1985): A Comprehensive Grammar of the English Language. London: Longman. Randall, Janet H. (1988): “Of Butchers, Bakers and Candlestickmakers: The Problem of Morphology in Understanding Words.” In: Davison, Alice and Georgia M. Green (Eds.) (1988): Linguistic Complexity and Text Comprehension: Readability Issues Reconsidered. London: Erlbaum, 223-247. Rettig, Wolfgang (1981): Sprachliche Motivation: Zeichenrelation von Lautform und Bedeutung am Beispiel französischer Lexikoneinheiten. Frankfurt/ Main: Lang. Ricken, Ulrich (1983): Französische Lexikologie: Eine Einführung. Leipzig: Verlag Enzyklopädie. Ronneberger-Sibold, Elke (1980): Sprachverwendung - Sprachsystem: Ökonomie und Wandel. Tübingen: Niemeyer. Ronneberger-Sibold, Elke (1997): “Foreign Elements in German and French Trade Names.” In: Hickey, Raymond and Stanisłav Puppel (Eds.) (1997): Language History and Linguistic Modelling: A Festschrift for Jacek Fisiak on his 60th Birthday. Vol. 2. Berlin: de Gruyter, 1781-1800. Ronneberger-Sibold, Elke (2001): “On Useful Darkness: Loss and Destruction of Transparency by Linguistic Change, Borrowing and Word Creation.” In: Booij, Geert and Jaap van Marle (Eds.) (2001): Yearbook of Morphology 1999, 97-120. Ronneberger-Sibold, Elke (2002): “Volksetymologie und Paronomasie als lautnachahmende Wortschöpfung.” In: Habermann, Mechthild, Peter O. Müller and Horst Haider Munske (Eds.) (2002): Historische Wortbildung des Deutschen. Tübingen: Niemeyer, 105-127. Rufener, John (1971): Studies in the Motivation of English and German Compounds. Zürich: Doctoral thesis. Sandra, Dominiek (1993): “The Use of Lexical Morphology as a Natural Mnemonic Aid in Learning Foreign Language Vocabulary.” In: Chapelle, Jacques and Marie-Thérèse Claes (Eds.) (1993): Proceedings of the First International Congress on Memory and Memorization in Acquiring and Learning Languages. Louvain-la- Neuve: CLL, 263-294. Sauer, Hans (1982): Review of Alwin Fill’s Wortdurchsichtigkeit im Englischen. In: Anglia 100 (1982), 467-471. de Saussure, Ferdinand (1916/ 1960): Cours de linguistique générale. Paris: Payot. de Saussure, Ferdinand (1916/ 1974): Course in General Linguistics. Trans. Wade Baskin. London: Fontana/ Collins. Printed sources and CD-ROMs 291 de Saussure, Ferdinand (1916/ 1983): Course in General Linguistics. Trans. Roy Harris. Ed. Charles Bally and Albert Sechehaye. London: Duckworth. Sauvageot, Aurélien (1964): Portrait du vocabulaire français. Paris: Larousse. Scheidegger, Jean (1981): Arbitraire et motivation en français et en allemand: Examen critique des thèses de Charles Bally. Bern: Francke. Scheler, Manfred (1977): Der englische Wortschatz. Berlin: Schmidt. Scherfer, Peter (1989): “Vokabellernen.” In: Der fremdsprachliche Unterricht 23 (1989), 4-18. Schippan, Thea (1974): “Lexikalische Bedeutung und Motivation.” In: Zeitschrift für Phonetik, Sprachwissenschaft und Kommunikationsforschung 27 (1974), 212-222. Schmitt, Norbert and Michael McCarthy (Eds.) (1997): Vocabulary: Description, Acquisition and Pedagogy. Cambridge: CUP. Schreuder, Robert and R. Harald Baayen (1997): “How Complex Simplex Words can Be.” In: Journal of Memory and Language 37 (1997), 118-139. Schreuder, Robert, Cristina Burani and R. Harald Baayen (2003): “Parsing and Semantic Opacity.” In: Assink, Egbert M. H. and Dominiek Sandra (Eds.) (2003): Reading Complex Words. Amsterdam: Kluwer, 159-189. Schwarz, Monika (1996): Einführung in die kognitive Linguistik. 2nd, rev. ed. Tübingen: Francke. Seppänen, Lauri (1978): “Zur Ableitbarkeit der Nominalkomposita.” In: Zeitschrift für Germanistische Linguistik 6 (1978), 133-150. Shaw, James Howard (1979): Motivierte Komposita in der deutschen und englischen Gegenwartssprache. Tübingen: Narr. Sinclair, John (1991): Corpus, Concordance, Collocation. Oxford: OUP. Sperber, Hans (1923): Einführung in die Bedeutungslehre. Bonn: Schroeder. Sperber, Horst G. (1989): Mnemotechniken im Fremdsprachenerwerb mit Schwerpunkt “Deutsch als Fremdsprache”. Munich: Iudicium. Starch, Daniel and Edward C. Elliot (1912): “Reliability of the Grading of High- School Work in English.” In: School Review 20 (1912), 442-457. Starch, Daniel and Edward C. Elliot (1913): “Reliability of Grading Work in Mathematics.” In: School Review 21 (1913), 254-259. Stein, Gabriele (1974): “Word-Formation and Language Teaching.” In: Die Neueren Sprachen 73 (1974), 316-331. Stein, Gabriele (1977): “The Place of Word-Formation in Linguistic Description.” In: Brekle, Herbert and Dieter Kastovsky (Eds.) (1977): Perspektiven der Wortbildungsforschung: Beiträge zum Wuppertaler Wortbildungskolloquium vom 9. - 10. Juli 1976: Anläßlich des 70. Geburtstags von Hans Marchand am 1. Oktober 1977. Bonn: Bouvier, 219-235. Stein, Gabriele (1985): “Word-Formation in Modern English Dictionaries.” In: Ilson, R. (Ed.) (1985): Dictionaries, Lexicography and Language Learning. Oxford: Pergamon, 35-44. Stein, Gabriele (2002): Developing Your English Vocabulary: A Systematic New Approach. Tübingen: Stauffenburg. Štekauer, Pavol (2005): Meaning Predictability in Word Formation: Novel, Context-Free Naming Units. Amsterdam: John Benjamins. Stolz, Jennifer A. and Laurie Beth Feldman (1995): “The Role of Orthographic and Semantic Transparency of the Base Morpheme in Morphological Processing.” In: Feldman, Laurie Beth (Ed.) (1995): Morphological Aspects of Language Processing. Hillsdale, N. J.: Erlbaum, 109-129. Bibliography 292 Stork, Antje (2003): Vokabellernen: Eine Untersuchung zur Effizienz von Vokabellernstrategien. Tübingen: Narr. Taft, Marcus and Paul Kougious (2004): “The Processing of Morpheme-Like Units in Monomorphemic Words.” In: Brain and Language 90 (2004), 9-16. The Oxford English Dictionary (1989). 2nd ed. Oxford: Clarendon. The Shorter Oxford English Dictionary on CD-ROM (2002). 5th ed. Version 2.0. Oxford: OUP. Tournier, Jean (1985): Introduction descriptive à la lexicogénétique de l’anglais contemporain. Paris: Champion-Slatkine. Ullmann, Stephen (1952): Précis de sémantique française. 5th ed. Bern: Francke. Ullmann, Stephen (1962): Semantics: An Introduction to the Science of Meaning. Oxford: Blackwell. Ulrich, Winfried (1972): “Morphologische und semantische Motivation in der deutschen Wortbildung.” In: Muttersprache 5 (1972), 281-290. Vachek, Josef (1981): Review of Alwin Fill’s Wortdurchsichtigkeit im Englischen. In: Lingua 54 (1981), 273-276. Vossen, Carl (1992): Mutter Latein und ihre Töchter: Europas Sprachen und ihre Herkunft. 13th rev. ed. Düsseldorf: Stern-Janssen. Wandruszka, Mario (1969): Sprachen - vergleichbar und unvergleichlich. Munich: Piper. von Wartburg, Walther (1943): Einführung in Problematik und Methodik der Sprachwissenschaft. Halle: Niemeyer. von Wartburg, Walther (1943/ 1946): Problèmes et méthodes de la linguistique. Trans. Pierre Maillard. Paris: Presses universitaires de France. Weinreich, Uriel (1955): Review of Stephen Ullmann’s Précis de sémantique française. In: Language: Journal of the Linguistic Society of America 31 (1955), 537-543. Weis, Dieter (1986): “Untersuchungen zur langfristigen Verfügbarkeit von Wortschatz im Leistungsfach Englisch.” In: Neusprachliche Mitteilungen aus Wissenschaft und Praxis 39 (1986), 174-180. Weisgerber, Leo (1953): Vom Weltbild der deutschen Sprache. Vol. 1. Die inhaltbezogene Grammatik. 2nd ed. Düsseldorf: Schwann. West, Michael (1953): A General Service List of English Words with Semantic Frequencies and a Supplementary Word-List for the Writing of Popular Science and Technology. London: Longman. Wilkins, David Arthur (1972): Linguistics in Language Teaching. Cambridge, Massachusetts: MIT Press. Wolff, Dieter (1969): Statistische Untersuchungen zum Wortschatz englischer Zeitungen. Universität des Saarlandes: Doctoral thesis. Wunderli, Peter (1989): Französische Lexikologie. Tübingen: Niemeyer. Zöfgen, Ekkehard (2002): “Motiviertheit lexikalischer Einheiten im Französischen.” In: Kolboom, Ingo, Thomas Kotschi and Edward Reichel (Eds.) (2002): Handbuch Französisch: Sprache - Literatur - Kultur - Gesellschaft: Für Studium, Lehre, Praxis. Berlin: Schmidt, 189-193. Zwitserlood, Pinie (1994): “The Role of Semantic Transparency in the Processing and Representation of Dutch Compounds.” In: Language and Cognitive Processes 9 (1994), 341-368. Internet sources 293 7.2 Internet sources American Heritage Dictionary. <www.bartleby.com/ 61/ 66/ Y0026600.html>, 24.10.2006. Berlin-Brandenburgische Akademie der Wissenschaften: DWDS Core Corpus. <www.dwds.de>, 13.10.2006. Berlin-Brandenburgische Akademie der Wissenschaften: Information on the DWDS Core Corpus. <www.dwds.de/ ueber>, 31.10.2006. Berlin-Brandenburgische Akademie der Wissenschaften: Information on the moottagger used in the DWDS Core Corpus. <www.dwds.de/ erschliessung/ pos_tagger>, 31.10.2006. Canoo. <www.canoo.net>, October 2006. Deutsche Forschungsgemeinschaft: Vorschläge zur Sicherung guter wissenschaftlicher Praxis. <www.dfg.de/ aktuelles_presse/ reden_stellungnahmen/ download/ empfehlung_wiss_praxis_0198.pdf>, 23.11.2006. Duden - Das große Wörterbuch der deutschen Sprache. <www.bibliothek.uni-regensburg.de/ dbinfo/ einzeln.phtml? bib_id=ub_ a&colors=63&ocolors=40&titel_id=1558>, 23.11.2006. Google. <www.google.de>, 30.10.2006. Information on the Oxford English Dictionary. <www.oed.com/ about>, 25.05.2005. Information on the STTS Tagset. <www.ifi.unizh.ch/ CL/ tagger/ UIS-STTS-Diffs.html>, 25.05.2005. Institut für deutsche Sprache, Mannheim: Grammis. <http: / / hypermedia.ids-mannheim.de/ pls/ public/ gramwb.ansicht>, October 2006. Institut für deutsche Sprache, Mannheim: Information on the IDS Corpora. <www.ids-mannheim.de/ cosmas2/ referenz/ korpora.html>, 31.10.2006. Kilgarriff, Adam: BNC Frequency List lemma.num. <www.kilgarriff.co.uk/ BNC_lists/ lemma.num>, 31.10.2006 and <ftp: / / ftp.itri.bton.ac.uk/ bnc/ lemma.num>, September 2004. Bibliography 294 Kilgarriff, Adam: Information on the BNC Frequency Lists. <www.kilgarriff.co.uk/ bnc-readme.html>, 13.10.2006. Kultusministerkonferenz: Educational Standards. <www.kmk.org/ schul/ Bildungsstandards/ 1.Fremdsprache_MSA_BS_04-12- 2003.pdf>, p.18, 15.11.2006. Merriam-Webster Online Search. <www.m-w.com>, October 2006. MLA Bibliography. <http: / / web1.infotrac.galegroup.com/ itw/ infomark/ 0/ 1/ 1/ purl=rc6_MLA? sw_aep=uben>, 17.11.2006. Part of Speech Poem. <www.happychild.org.uk/ acc/ tpr/ mne/ 0011gram.htm>, 14.10.2006. Simplified Spelling Society. <www.spellingsociety.org/ aboutsss/ leaflets/ whyeng.php>, 14.10.2006. Staatsinstitut für Schulpädagogik und Bildungsforschung: Curricula for Bavarian Schools. <www.isb.bayern.de/ isb/ index.asp? MNav=0&QNav=4&TNav=0&INav=0>, 14.11.2006. Yahoo (English). <http: / / uk.yahoo.com>, October 2005. Yahoo (German). <http: / / de.yahoo.com>, October 2005.