Fremdsprachen Lehren und Lernen
flul
0932-6936
2941-0797
Narr Verlag Tübingen
Es handelt sich um einen Open-Access-Artikel, der unter den Bedingungen der Lizenz CC by 4.0 veröffentlicht wurde.http://creativecommons.org/licenses/by/4.0/121
2004
331
Gnutzmann Küster SchrammCorpus-based Word Frequency Analysis and the Teaching of German Vocabulary
121
2004
Randall Jones
The teaching of German vocabulary requires a great deal of careful thought as to the selection and sequencing of individual lexical items. Lexical frequency should be an important criterion in the selection of words, i.e. in general, words which occur most frequently in the language should be among those taught in the earlier stages of instruction. After tracing the history of word frequency studies for German, the article describes a recently compiled corpus of contemporary German to form the basis of a new frequency dictionary of German. In addition, the article discusses a number of issues to be solved in the generation of frequency lists.
flul3310165
Randall JONES* Corpus-based Word Frequency Analysis and the Teaching of German Vocabulary Abstract. The teaching of German vocabulary requires a great deal of careful thought as to the selection and sequencing of individual lexical items. Lexical frequency should be an important criterion in the selection of words, i.e. in general, words which occur most frequently in the language should be among those taught in the earlier stages of instruction. After tracing the history of word frequency studies for German, the article describes a recently compiled corpus of contemporary German to form the basis of a new frequency dictionary of German. In addition, the article discusses a number of issues to be solved in the generation of frequency lists. 1. Introduction A learner of the German language is immediately exposed to a variety of words in everything spoken or written. lt becomes a matter of necessity to understand the meaning of these new words, and the learning curve is often steep. lt seems that for every three words that are learned, at least one is forgotten. lt becomes a giant lexical juggling act with no letup. How many words must one leam tobe able to be proficient in German, and who decides which words should be taught and when? In the teaching of vocabulary in a German language program, be it at the beginning, intermediate, or advanced level, selection, quantity, and sequencing of the individual vocabulary items are important considerations. How many words should be introduced at each stage along the way and which ones should they be? Which words should be introduced at the beginning stages and which ones at later stages? Exactly how should the words be taught and how often should they be reinforced at later points along the way? Should all words in a logical set, e.g. dative prepositions, personal pronouns, subordinating conjunctions, be taught in the same lesson or should they be dispersed throughout the learning period according to need? And what about polysemy and tnultiword units? Many words have more than one meaning and there are specific units of meaning that consist of more than one word. Korrespondenzadresse: Professor Dr. Randall L. JONES, Brigham Young University, Department of Germanic & Slavic Languages, PO Box 26001, PROVO, UT 84602, USA E-mail: randall_jones@byu.edu Areas of work: Corpus Linguistics, Technology and Second Language Acquisition, Testing. JFJL! JL 33 (2004) 166 Randall Jones 2. Language Corpora and Vocabulary During the past few decades more and more language professionals have tumed to language corpora in order to obtain lexical and grammatical information about a language. A corpus is simply a collection of texts both written and spoken from a specific language which is meant to be a rational sample of the language as a whole, or a subset of the language limited by e.g. genre, register, chronology, geography, age, etc. Since the appearance of the Brown Corpus, known officially as 'A Standard Sample of Present-Day Edited American English, for Use with Digital Computers' (FRANCIS/ KuCERA 1964) in 1960, the field of corpus linguistics has grown into a large and active discipline which has produced numerous language corpora, conference papers and published material (see e.g. ASTON 2001, DODD 2000 and KENNEDY 1998). The movement has been to a large extent an attempt to shift the focus of language research away from intuitive analysis to data oriented analysis, i.e. basing assumptions about language on actual samples of authentic speech. One basic area of corpus analysis is the generation of lexical frequency information. Corpus-based word frequency studies can be useful in dealing with many of the issues mentioned above. lt can be assumed, for example, that the sequencing of vocabulary in a language learning program should in some way reflect the frequency of words as used by native speakers, i.e. the more frequent the use in the natural language the earlier the word should be introduced. Natural and authentic sentences used as models in second language learning cannot of course consist exclusively of high frequency vocabulary, but it can be argued that frequency should at least be a consideration. This paper describes a recently completed frequency study of German vocabulary that was based on a broad based corpus of contemporary German. Its designed purpose is principally pedagogical, but it can also be useful to language professionals in other areas. 3. Lexical Frequency and Text Coverage An important question in the lexical frequency analysis of a text deals with the question of "coverage", i.e. what percentage of a text is accounted for or "covered" by the 100 (1000, 2000, 3000, etc.) most frequently occurring words. Several studies in English corpus-based frequency analysis and reported in NATION (2001: 6-22) provide some interesting information about this. In a study done more than thirty years ago by CARROLL [et al.] (1971), for example, it was demonstrated that in an academic text of English, approximately 74% of the words in the text are covered by the 1,000 most frequently occurring words. The next 1,000 words increases this number to 81.3%, another 1,000 words to 85.2%. To achieve 100% coverage would require 86,741 words, i.e. all ofthe words in the text (NATION 2001: 15). IFJLuL 33 (2004) Corpus-based Word Frequency Analysis and the Teaching of German Vocabulary 86,741 43,831 12,448 5,000 4,000 3,000 2,000 1,000 100 10 100 99 95 89.4 87.6 85.2 81.3 74.1 49 23.7 Table 1: Vocabulary size and coverage (CARROLL/ DAVIESIRICHMAN 1971) as Reported in NATION (2001) 167 In a more recent study, NATION (2001: 12-13) decided to add an acadernic word list to his English corpus. Because most of word frequency work he was doing affected non- English learners at the university level, he felt that certain vocabulary specific to the university leamer would provide a better metric. In Table 2 it can bee seen that adding 570 high frequency words from an acadernic word list to the 2000 most frequent words from a general corpus he succeeded in bringing the coverage to 86%. 1 s t 1000 words 2 nd 1000 words Academic Word List (570 words) Others 71.4 4.7 10.0 13.9 Table 2: The coverage by the different kinds of vocabulary in an academic corpus (NATION 2001: 13) In another study by Nation (NATION 2001: 17-19) the types oftexts were subdividedinto Conversation, Fiction, Newspapers, and Acadernic. Tue 2,570 most frequently occurring words accounted for more than 92% of conversation, 89.1 % of fiction, 85% of newspapers, and 86.6% of acadernic texts. 1 st 1000 2 nd 1000 Academic Other 84.3% 6% 1.9% 7.8% 82.3% 5.1% 1.7% 10.9% 75.6% 4.7% 3.9% 15.7% 73.5 4.6 8.5 13.3 Table 3: Text type and text coverage by the most frequent 2000 words of English and an academic word list in four different kinds of texts (NATION 2001: 17) ]F]Ll.l]L 33 (2004) 168 Randall Jones 4. German Word Frequency Studies Word frequency studies in German go back over 100 years to 1897, when F.W. Kaeding published his Häufigkeitswörterbuch der deutschen Sprache (KAEDING 1897). His study was based on a 10 rnillion word corpus, which included a variety of text types including newspapers, classical German literature, and official documents. Perhaps surprisingly, the specific objective for compiling the corpus was not for pedagogical purposes; rather it was intended as an aid in developing a stenographic short hand system for German. On the title page is printed: Festgestellt durch einen Arbeitsausschuß der deutschen Stenographiesysteme. The work was done entirely by hand. Every word together with reference information was written down on a piece of paper, the papers were then sorted and the information processed. ön1f9kritsmödttbudJ ti? it« 1: elt ! miltt. Wll) ~lb"! A«: 1/ htlll}CII. In 1928 B. Q. Morgan, then ofthe University of Wisconsin in the United States, saw the value of Kaeding's work and he adapted the word count for the express purpose of writing pedagogical material for the teaching of German in American high schools and universities (MORGAN 1928). His book immediately became the basis for vocabulary selection in most U.S. German texts for the next thirty years not withstanding the fact that the corpus on which the frequency count was based was getting rather old and had survived World War I and a great deal of social change. Furthermore, the texts were all lFJLlllL 33 (2004) Corpus-based Word Frequency Analysis and the Teaching of German Vocabulary 169 written material and consisted primarily of govemment documents, academic texts, commercial letters, military texts, newspapers and classical German literature, much of which was no doubt rather dated at the time. As late as 1975 Wolf Dieter Ortmann published a book based on the Kaeding material which was intended for teachers and learners of German (ORTMANN 1975). In 1960-61 J. Alan Pfeffer, then at The University of Buffalo (USA), spent the academic year in Germany supported by a generous U.S. Department of Education grant. During that time he arranged to record 400 brief conversations (ca. 12 minutes each), which were then transcribed and analyzed for word frequency. In 1964 his Basic (spoken) German Word List was published and replaced Morgan as the basis for vocabulary selection in German language textbooks written for American learners (PFEFFER 1964). One of the attractive features of the Pfeffer frequency list is the fact that the corpus on which it was based was highly structured. He carefully selected sixty localities throughout the German speaking area and varied the number of interviews in each area based on population. The interviewers were selected to represent a broad spectrum of the population. 5. Other German Corpora All of the frequency studies mentioned were based on some kind of corpus or collection oflanguage material. Today there are numerous German language corpora at universities and research centers throughout the world. Some are very specific, e.g. based on a single German newspaper; others include a wide variety of spoken and written samples of the language. lt would go beyond the scope of this article to report on the collection of existing German corpora or published German word frequency studies (see e.g. MEIER 1967, ROSENGREN 1972, RUOFF 1981, SCHERER 1965, and SWENSON 1967). Tue largest collection of German corpora is held at the Institut für deutsche Sprache in Mannheim. For a variety of reasons such as the age of much of the material contained in such corpora, the narrow scope of the texts collected, or the unbalanced content of the corpus among other things, there seemed to be a need to compile a new representative, balanced, and more current corpus in order to generate an accurate lexical frequency list. The remainder of this article will focus on this new corpus and discuss how frequency information derived from it can be useful in teaching and learning German vocabulary. 6. The Leipzig/ BYU Corpus The corpus is a joint project of the University of Leipzig and Brigham Young University and is known as the Leipzig/ BYU Corpus of Contemporary German. The project was begun in 2001 and resulted in part from an invitation from Routledge Publishers to contribute to a series of language frequency dictionaries planned for future publication. The selection criteria for the German corpus included the following: JFJLd 33 (2004) 170 Randall Jones • The size would be approximately 4,000,000 words. • lt would include both spoken as well as written German. • lt would reflect reasonably current language. • lt would include material from Germany, Austria and Switzerland. • lt would include a wide variety of text types. • lt would represent a balanced selection. The question of corpus size is a disputed topic within the language corpus community. A reasonable assumption is that the larger the corpus, the more representative it is. Tue Brown Corpus consists of one rnillion words and was felt in its time to be more than adequate (FRANCIS/ KucERA 1964). The British National Corpus, which appeared in 1994, has 100 rnillion words (BURNARD 1995). But adequate lexical representation is just as much a factor of balance as it is of size. A fifty rnillion word corpus of a single German newspaper is probably less representative than a one rnillion word corpus of a variety of text types. The Leipzig/ BYU Corpus is not considered particularly large by today's standards. lt can be argued, however, that it is sufficient for the purpose of generating a contemporary frequency list, given the carefully designed method of text selection. In other words, with a relatively large number of different text samples from a balanced large cross-section of the German speaking population, it would not be necessary to have a corpus of more than 4 rnillion words. Tue spoken component of the corpus is one rnillion words and consists of 700,000 running words of conversation and 300,000 words of television material. The conversations were recorded in Austria, Germany, and Switzerland between 1989 and 1993 and transcribed in 1994 (JONES 1997). They consist of 400 12-13 rninute conversations between native speakers who represent a broad demographic base. The topics vary widely and include e.g. current events, politics, local history, memories of childhood, the weather, travel, hobbies, etc. The television materials are transcriptions of a popular German sitcom and several television talk shows. Although it is true that the language spoken in a TV sitcom is scripted and therefore not authentic, it is also true that television script writers attempt to create dialog that is a reasonable reflection of the spoken language. Furthermore, it has been shown that actors often ad lib and speak in a way that is more natural to them. The written material was taken from literature, newspapers, acadernic texts, and general everyday prose. The literature section has approximately one rnillion words and represents seven genres: literary fiction, contemporary popular fiction, adolescent fiction, travel writing, humor, romance, crime / adventure. Approximately 10,000 words each were selected from 100 sources. With a few exceptions, all of the texts were published since the year 1990 and much of it since 2000. The one rnillion words of newspaper material were taken from 100 different editions of national as well as regional newspapers in the three German-speaking countries, published at various times in 2001 and 2002. Texts were seiected from politics, economy, culture, sports, regional news, and opinion. The acadernic section consists of roughly one rnillion words of material from 116 different sources; including university level course books, Gymnasium second level lFLuL 33 (2004) Corpus-based Word Frequency Analysis and the Teaching of German Vocabulary 171 books, various popular science journals, and technical and scholarly journals. The subject matter includes virtually all topics treated at the Gymnasium and university levels, e.g. natural and social sciences, technology, humanities, art, music, law, and medicine. Tue fifth category, general everyday prose, consists of 200,000 words taken from shorter texts which include e.g. operating instructions, product descriptions, recipes, legal agreements and contracts, customer advice information, classified advertisements, etc. 6.1 Corpus Analysis using WordSmith Tools The software package WordSmith Tools (SCOTT 1999) was selected to generate the word frequency list. The procedure is, however, not as straightforward as one may imagine. First, it is important to define exactly what a "word" is. Table 4 shows the sixty most frequently occurring words in the Leipzig/ BYU Corpus as generated by WordSmith Tools. 1. die 2. der 3. und 4. in 5. zu 6. ich 7. den 8. ist 9. sie 10. von 11. nicht 12. mit 13. das 14. es 15. ein 16. sich 17. auf 18. auch 19. eine 20. im 137656 21. für 26641 41. an 15820 120571 22. er 25077 42. hat 15476 119147 23. so 24181 43. noch 15243 70216 24. dass 24075 44. nach 15143 61581 25. dem 23950 45. da 14977 46953 26. des 22641 46. was 13858 46036 27. als 22327 47. haben 13453 42962 28. an 20155 48. also 13139 42932 29. ja 19898 49. sein 13127 41452 30. wie 19763 50. wird 12917 40251 31. war 18470 51. wenn 12803 37488 32. werden 18260 52. nur 12571 36961 33. bei 17710 53. einer 12023 35098 34. oder 17436 54. einen 11727 32948 35. wir 17080 55. einem 10465 32878 36. aber 16893 56. über 10214 31046 37. dann 16632 57. du 9509 30309 38. man 16383 58. schon 9148 28240 39. sind 16109 59. habe 9118 27880 40. aus 16042 60. vor 8895 Table 4: Tue 60 most frequently occurring words in the Leipzig/ BYU Corpus before tagging and lemmatization These are words as they appear on a page, but not necessarily as they might appear as an entry in a dictionary. There are two problems. First, some of the entries are simply inflected forms of a more basic word, e.g. ist, war, and sind belong to the verb sein, an and am belong together, and die, der, das, etc. belong to the same form. In addition, mit, auf, bei, aus, and nach are polysemous, i.e. they have more than one meaning or function. They are prepositions as well as verb prefixes, yet they are listed together. lt is necessary IFlLIIJlL 33 (2004) 172 Randall Jones to perform two tasks in order to deal with these problems. Tue first is to mark the words in the corpus so that polysemous words can be separated or disambiguated. Then it is necessary to lemmatize the inflected forms, i.e. collect them together under the same lemma or base word. 6.2 POS Tagger The first task can be performed by use of what is known as apart of speech (POS) tagger. This is special software that examines each word in the corpus and assigns it a tag in order to identify the correct part of speech. lt accomplishes the identification by analyzing the syntactic environment in which the word occurs, then deciding if, for example, auf is a preposition or verb prefix. For this purpose the TreeTagger from the University of Stuttgart was used. Bach ofthe 4,200,000 words was assigned a tag such as [VVBR] (verb), [KONJ] (conjunction), [APPR] (preposition), etc. When WordSmith Tools processes the corpus much of the disambiguation is resolved as a result of the word tagging. 6.3 Lemmatization The second task is performed within WordSmith Tools. Bach inflected form is assigned to a base form, e.g. bin, bist, ist, sind, war, gewesen, etc. are assigned to sein. This task must be done by hand within the on-line frequency list and is very labor intensive. As Table 5 (on the following page) shows , the final product is a standardized frequency list that reflects the lemmata that one would find in most dictionaries. (The tags are removed before the individual entries are expanded.) Note that the word sein appears twice, once as a verb and once as a pronoun. Note also that there are now several verbs on the list, as the accumulation of the various inflected forms increased their respective frequencies to a rather great extent. 6.4 Separable Prefix Verbs A final step in the completion of the frequency list deals with the separable prefix verbs, of which there are several hundred. The frequency counts for words such as ab and nehmen are at first not entirely accurate, as they also contain counts for the separate parts of the verb abnehmen. lt is an arduous task to locate and reconstruct the two parts of these verbs, but WordSmith Tools does make the work considerably easier. In the final analysis, there will be an accurate count for all of the high frequency separable verbs as well as the other function of the prefix (usually a preposition) and the simple verbs. JFLIDJL 33 (2004) Corpus-based Word Frequency Analysis and the Teaching of German Vocabulary 1. der[ARTI] 355,910 21. zu[ZUPT] 25,658 41. also[ADVB] 2. und[KONJ] 119,147 22. an[PREP] 25,632 42. aus[PREP] 3. sein[VER2] 102,879 23. er[PRNP] 25,077 43. da[ADVB] 4. in[PREP] 100,477 24. so[ADVB] 24,181 44. all[PRON] 5. ein[ARTI] 97,450 25. dass[KONJ] 24,075 45. wenn[KONJ] 6. haben[VER2] 55,276 26. können[VERl] 23,652 46. nur[ADVB] 7. ich[PRNP] 46,953 27. dies[PRON] 22,594 47. müssen[VERl] 8. werden[VER2] 46,042 28. als[KONJ] 22,327 48. sagen[VERl] 9. sie[PRNP] 42,932 29. ja[PART] 19,898 49. wie[ADV] 10. von[PREP] 41,452 30. bei[PREP] 17,710 50. über[PREP] 11. nicht[NEGP] 40,251 31. oder[KONJ] 17,436 51. machen[VERl] 12. mit[PREP] 36,732 32. ihr[PRON] 17,105 52. kein[PRON] 13. der[RELP] 35,203 33. wir[PRNP] 17,080 53. Jahr[SUBS] 14. es[PRNP] 35,098 34. aber[KONJ] 16,893 54. du[PRNP] 15. sich[REFP] 32,878 35. dann[ADVB] 16,632 55. geben[VERl] 16. zu[PREP] 32,572 36. man[PRON] 16,383 56. kommen[VERl] 17. der[PRON] 31,949 37. sein[PRON] 15,294 57. mein[PRON] 18. auch[ADVB] 30,309 38. noch[ADVB] 15,243 58. schon[ADVB] 19. auf[PREP] 28,695 39. nach[PREP] 14,642 59. durch[PREP] 20. für[PREP] 26,825 40. was[PRON] 13,858 60. vor[PREP] Table 5: The 60 most frequently occurring words in the Leipzig/ BYU Corpus after tagging and lemmatization * 6.5 Size of Frequency List 173 13,139 12,946 12,886 12,867 12,803 12,571 12,556 12,410 11,029 10,438 9,924 9,701 9,539 9,509 9,473 9,372 9,312 9,148 8,979 8,974 An important question presents itself as the work with lellllllatization and the reconstruction of separable prefix verbs proceeds: How far down the list should one go? In other words, when does it become inefficient to continue? Tue Leipzig/ BYU Corpus of 4,200,000 words yields approximately 230,000 individual word tokens (each word counted only once), and these are not lelllllla but inflected forms. About half of these word tokens are hapax legomena, i.e. they occur only once in the entire corpus. Tue first entry in the frequency list, the collection of definite articles, has a value of 355,910 and accounts for 8.27% of the entire corpus. The first ten words account for 23.45% of the corpus. After that the values begin to decrease dramatically. Nonetheless, prelirninary studies have shown that the coverage analysis work on English texts done by Nation produce sirnilar results with the Leipzig/ BYU Corpus: The first 2000 most frequent words account for approximately 75% to 85% ofthe entire corpus, depending on the text register (JONES forthcorning). Spoken texts yield the highest number, acadernic texts the lowest. For the first 3000 most frequent words the numbers increase to almost 90% for spoken texts. As increments of additional 1000 word chunks are added, the coverage increases only slightly. lt would seem reasonable that for a general frequency dictionary somewhere between 3000 and 4000 words would be optimal. The actual tags have been modified slightly in order to be more understandable. JF]Ll.l][, 33 (2004) 174 Randall Jones lt is also interesting to look at some of the words at the various frequency points (Table 6). Most words seem intuitively to be at the correct point on the frequency scale, but others may seem puzzling. For example, eigentlich, praktisch, and sicherlich (F = ~ 100, 500, 1000) would not seem tobe words that are used with such high frequency, yet they appear relatively high on the list. lt should be kept in mind that they occur frequently as particles, especially in the spoken language (25% of the corpus). The word Kapitel (F = ~ 1000) has what can be called an unbalanced frequency. lt occurs mostly in academic books to refer to other chapters within the book. The word Therapie (F = ~ 1500) occurs frequently but in a relatively small number of professional joumals dealing with medicine and health. This brings up an important topic conceming the final review of the frequency list, namely that of distribution analysis. In addition to word frequency it is important to consider word distribution within the corpus. There are words that have a high frequency value because they are used numerous times throughout the entire corpus, while other words, such as Kapitel and Therapie, have an equally but artificially high frequency value because they are used numerous time within a relatively small number of texts. In the case of the Leipzig/ BYU frequency list the text distribution has been analyzed carefully before the final version of the frequency list was completed and "out of fit" words have been eliminated. eigentlich bald Kapitel auseinander Charakter Kreuz damit Kunst sicherlich unheimlich dunkel fremd an praktisch soweit Therapie Erkenntnis sorgfältig selbst Beruf rasch normalerweise Erwartung versehen finden Begriff zwanzig rein heftig achtzehn Table 6: Sample words at various frequency points 7. Conclusion In a natural language learning environment the elements of the language are encountered in a random fashion and those words, phrases and structures that are experienced most frequently are usually the ones that are learned the quiekest. lt is perhaps the best way to learn a language if one has the time and opportunity, and if one is able to self monitor. In a structured learning environment the experience of sorting out these elements can be enhanced by guiding the learner in her or his exposure to the bits and pieces of the language. For vocabulary, word frequency information can be a valuable asset to both learning and teaching. lFLIIIL 33 (2004) Corpus-based Word Frequency Analysis and the Teaching of German Vocabulary 175 References ASTON, Guy (2001): Learning with Corpora. Houston: Athelstan. BURNARD, L. (ed.) (1995): User's guide to the British National Corpus. Oxford: Oxford University Computing Services. DODD, Bill (ed.) (2000): Working with German corpora. Brimingham: University Press. CARROLL, J. B. [et al.] (1971): The American Heritage Word Frequency Book. New York: Houghton Mifflin. FRANCIS, W. N. and KUCERA, H. (1964): Manual of Information to Accompany 'A Standard Sample of Present-Day Edited American English, for Use with Digital Computers' (revised 1979). Providence, RI: Department of Linguistics, Brown University. JONES, Randall L. (1997): "Creating and Using a Corpus of Spoken German". In: WICHMANN, Anne [et al.] (eds.): Teaching and Language Corpora. London: Longman, 146-156. JONES, Randall L. (2000a): "A corpus-based study of German accusative/ dative prepositions". In: DODD, William J. (ed.): Working with German Corpora. Birmingham: University Press, 116-142. JONES, Randall L. (2000b): "Textbook German and authentic spoken German: a corpus-based comparison". In: LEWANDOWSKA-TOMASZCZYK, Barbara/ MELIA, Patrick James (eds.): PALC'99: Practical Applications in Language Corpora. Frankfurt/ M: Peter Lang, 501-516. JONES, Randall (forthcoming): "An Analysis of Lexical Text Coverage in Contemporary German". In: WILSON, Andrew [et al.] (eds.): Corpus Linguistics Around the World. Amsterdam: Rodopi. KAEDING, F. W. (1898): Häufigkeitswörterbuch der deutschen Sprache. Steglitz bei Berlin: Selbstverlag des Herausgebers. KENNEDY, Graeme (1998): An Introduction to Corpus Linguistics. London: Longman. MORGAN, B. Q. (1928): German Frequency Word Book. New York: Macmillan. MEIER, Helmut (1967): Deutsche Sprachstatistik. Hildesheim: Georg Olms. NATION, I. S. P (2001): Learning Vocabulary inAnother Language. Cambridge: Cambridge University Press. ORTMANN, Wolf Dieter (1975): Hochfrequente deutsche Wortformen. München: Goethe Institut. PFEFFER, J. Alan (1964): Basic (Spoken) German Word List. Englewood Cliffs, NJ: Prentice-Hall. ROSENGREN, Inger (1972): Ein Frequenzwörterbuch der deutschen Zeitungssprache. Lund, Schweden: Gleerup. RUOFF, Arno (1981): Häufigkeitswörterbuch gesprochener Sprache. Tübingen: 1981. SCHERER, George A.C. (1965): Final Report of the Director on Word Frequency in the Modern German Short Story. Boulder, CO [seif published]. SCOTT, Mike (1999): Wordsmith Tools version 3. Oxford: Oxford University Press. SWENSON, Rodney (1967): A Frequency Count of Contemporary German Vocabulary based on Three Current Leading Newspapers. Dissertation Abstracts 28, 2222A-2223A. lFLllllL 33 (2004)
