eJournals Arbeiten aus Anglistik und Amerikanistik / Agenda: Advancing Anglophone Studies 49/1

Arbeiten aus Anglistik und Amerikanistik / Agenda: Advancing Anglophone Studies
aaa
0171-5410
2941-0762
Narr Verlag Tübingen
10.24053/AAA-2024-0001
61
2024
491 Kettemann

Schulenglisch: A multi-dimensional model of the variety of English taught in German secondary schools

61
2024
Elen Le Foll
English as it is taught in German schools (Schulenglisch) is often perceived to be radically different from natural English, as used outside the English as a Foreign Language (EFL) classroom. Previous corpus studies have confirmed that individual lexico-grammatical features are indeed often misrepresented in EFL textbooks used in Germany. This study presents an empirical multi-feature and multi-dimensional (MDA) analysis of the language of three series of EFL textbooks (15 textbook volumes) used at lower secondary school level in Germany, as compared to three target language reference corpora. Principal component analysis (PCA) is applied to identify the defining linguistic characteristics of Schulenglisch along three dimensions of linguistic variation: 1) ‘Written informational vs. Spoken interactional’, 2) ‘Fictional narrative’ and 3) ‘Didactised vs. Real-life English’. The distributions of texts on the first and second dimensions show that Schulenglisch is characterised by an underdifferentiation of register-based variation as compared to ‘real-life’, extra-curricular English. Mixed-effects models show that this finding is consistent across all three textbook series. Intra-textbook variation is mediated – to varying degrees on each of the three dimensions of the model – by text register, the proficiency level targeted by the textbooks, and interactions between these variables. In line with lay beliefs about Schulenglisch, the largest gap between Schulenglisch and extra-curricular English is observed in the conversation register. This gap persists even as textbook proficiency level increases. Compared to transcripts of natural, everyday conversations, Schulenglisch conversation significantly underrepresents key features of spontaneous, interactional spoken English, such as discourse makers, fillers, negation, contracted verbs, demonstratives, and it-pronouns. At the same time, it overrepresents features more typical of written, informative writing such as nouns, prepositions, and high lexical density and diversity. Across all registers, linguistic features that are typical of Schulenglisch, especially at the lower levels of proficiency, include imperatives, can as a modal, politeness markers and question forms.
aaa4910003
Schulenglisch: A multi-dimensional model of the variety of English taught in German secondary schools Elen Le Foll English as it is taught in German schools (Schulenglisch) is often perceived to be radically different from natural English, as used outside the English as a Foreign Language (EFL) classroom. Previous corpus studies have confirmed that individual lexico-grammatical features are indeed often misrepresented in EFL textbooks used in Germany. This study presents an empirical multi-feature and multi-dimensional (MDA) analysis of the language of three series of EFL textbooks (15 textbook volumes) used at lower secondary school level in Germany, as compared to three target language reference corpora. Principal component analysis (PCA) is applied to identify the defining linguistic characteristics of Schulenglisch along three dimensions of linguistic variation: 1) ‘Written informational vs. Spoken interactional’, 2) ‘Fictional narrative’ and 3) ‘Didactised vs. Real-life English’. The distributions of texts on the first and second dimensions show that Schulenglisch is characterised by an underdifferentiation of registerbased variation as compared to ‘real-life’, extra-curricular English. Mixedeffects models show that this finding is consistent across all three textbook series. Intra-textbook variation is mediated - to varying degrees on each of the three dimensions of the model - by text register, the proficiency level targeted by the textbooks, and interactions between these variables. In line with lay beliefs about Schulenglisch, the largest gap between Schulenglisch and extra-curricular English is observed in the conversation register. This gap persists even as textbook proficiency level increases. Compared to transcripts of natural, everyday conversations, Schulenglisch conversation significantly underrepresents key features of spontaneous, interactional spoken English, such as discourse makers, fillers, negation, contracted verbs, demonstratives, and it-pronouns. At the same time, it overrepresents features more typical of written, informative writing such as nouns, prepositions, and high lexical density and diversity. Across all registers, linguistic features that are typical of Schulenglisch, especially at the AAA - Arbeiten aus Anglistik und Amerikanistik Agenda: Advancing Anglophone Studies Band 49 · Heft 1 Gunter Narr Verlag Tübingen DOI 10.24053/ AAA-2024-0001 Elen Le Foll 4 lower levels of proficiency, include imperatives, can as a modal, politeness markers and question forms. 1. Introduction (1) Ich kann ja nur schulenglisch! ‘(But) I only know school English! ’ 1 <deTenTen18: homepagemodules.de> In German-speaking countries, utterances such as (1) are frequently heard. The noun compound Schulenglisch - literally: ‘school English’ - even boasts its own entry in the Duden dictionary (duden.de: n.d.). It refers to “a form of English that marks its users as having acquired the language in school” (Grau 2009: 170). Querying a large web-based corpus of German (the deTenTen18; see Jakubíček et al. 2013) for Schulenglisch retrieves numerous instances in which the term is used to emphasise that this level of English proficiency acquired at school either (just about) suffices to complete a specific task (e.g., (2) or, more frequently, does not (e.g., (3)). (2) Mit dem Schulenglisch kommt man dort schon klar. ‘You can get by with school English there.’ <deTenTen18: gymnasiumpasewalk.de> (3) Ich lese keine Bücher auf Englisch, dafür reicht mein Schulenglisch leider nicht aus. ‘I don’t read books in English, unfortunately my school English is not good enough for that.’ <deTenTen18: buchbegegnungen.de> By the end of lower secondary education, German pupils are expected to have attained B1 level (CEFR; Council of Europe 2020) in English and B2 by the end of general upper secondary. This is comparable to other European countries. The term Schulenglisch, however, does not solely refer to the proficiency level that is acquired by the majority of students by the end of their compulsory schooling, it also characterises the English taught and learnt in German schools as a special variety of English. Thus, in everyday language, Schulenglisch is frequently used in opposition to ‘authentic’ or ‘real-life’ English: (4) Schulenglisch und das Original - zwei Welten treffen aufeinander... ‘School English and the original - two worlds collide…’ <deTen- Ten18: realschule-wedemark.de> 1 All translations are mine. Schulenglisch: A multi-dimensional model 5 (5) Schulenglisch ist eine Sache, das, was man wirklich spricht, eine andere. ‘School English is one thing; how people really speak is another.’ <deTen- Ten18: buchdownload.at> This study attempts to shed light on this perceived gap between ‘real-life’ English and Schulenglisch. To this end, the language of three series of bestselling EFL textbooks used in secondary schools in Germany is examined. Though Schulenglisch usually refers more broadly to the variety of English both taught and acquired at school, the present study focuses on learners’ language input rather than their output. For the most part, students’ classroom-based English input stems from teacher talk, student production (both spoken and written) and pedagogical materials, foremost textbooks. Whilst this study focuses on the latter source of input, the following section explains how, in the German school context, EFL textbooks have a direct influence on the other two main sources: teacher and learner English production. 1.1. The role of EFL textbooks in secondary schools in Germany In Germany, education is the responsibility of the federal states, the Bundesländer. However, in describing the teaching of EFL across the country, Kurtz (2019: 116) speaks of the largely textbook-oriented everyday practice of teaching English. This is considered to be particularly true at lower secondary level, i.e., during the first five years of secondary education (Kurtz 2019: 122), where the textbook remains “the medium which has traditionally guided and organised teaching.” Anecdotal evidence from students and teachers suggests that - at least at lower secondary school level - this statement still rings true today. In fact, in some Bundesländer, this reliance on textbooks is more or less directly enshrined in the curriculum. For instance, the English curriculum for Gymnasium [secondary school for high-ability students] in Hessen proclaims that, at least for the first five years of secondary school, the textbook is the “Leitmedium [guiding medium]” (Hessisches Kultusministerium 2010: 4). The role of textbooks in German EFL classrooms has long been the subject of heated debates (see, e.g., Freudenstein 2001 and ensuing replies in “Praxis des neusprachlichen Unterrichts”). Although scholars and teaching practitioners disagree as to whether textbooks should play such an important role in German EFL classrooms, the fact is that the tables of contents of commercially published EFL textbooks largely determine the syllabus, whilst the chapters and units of these textbooks are often translated one-to-one into lesson planning (see, e.g., Siepmann 2007: 59). Thus, textbooks largely determine both the linguistic and topical focus of lessons. Though some supplementary materials may be used at times, the textbooks’ texts, exercises and tasks make up the vast majority of classroom activities. Elen Le Foll 6 As a result, it is fair to say that, at lower secondary level in Germany, textbooks constitute learners’ foremost source of classroom-based English input. Given the centrality of the textbook as the determining Leitmedium, it is evident that considerable proportions of teacher-based and peer-based English-language input in the German EFL classroom are directly or indirectly mediated by the textbook, too. For instance, much of teacher talk at secondary level revolves around the textbook, its explanations, instructions, and tasks, and much of learner writing, teacher-learner and learnerlearner spoken interactions are produced on the basis of the tasks, prompts and models proposed by the same textbook (see, e.g., Thornbury 2002). 2. Schulenglisch: State of the art The empirical study of textbook language has a long tradition in Germany. From the late 1980s onwards, Mindt (1987; 1992; 1995; 2000; 2005) conducted elaborate analyses of the language of German school EFL textbooks as compared to natural English produced by native speakers. Mindt’s pioneering method is firmly corpus-driven. It begins with the compilation of a corpus of naturally occurring speech or pseudo-speech (e.g., play scripts). The focus on spoken English is justified by asserting that the acquisition of oral communicative skills is the foremost aim of secondary school ELT. Based on this corpus data, Mindt extrapolated detailed empirical grammars of specific lexico-grammatical features of English. These empirical grammars - which marked a clear break from the tradition of introspectionbased, deductive grammars - were then used to evaluate how these features were represented in both the grammar sections and textual content of school EFL textbooks. To this end, Mindt compared the frequencies, functions, and lexical co-occurrences of each examined feature. A number of scholars subsequently adopted Mindt’s method and compared the use of additional individual lexico-grammatical features in ‘authentic’ English to that of German school EFL textbooks. The following sections provide an overview of these empirical textbook-based Schulenglisch studies before presenting the present study’s approach and research questions. 2.1. The grammar of Schulenglisch In the first of such studies, Mindt (1987; 1992) examined representations of future time expressions in textbooks designed for lower secondary level with those actually produced in speech by British native speakers. The examined German EFL textbooks were found to underrepresent will, overrepresent going to and entirely ignore shall as a future time expression. The contracted forms of going to were also observed to be considerably underrepresented and the absence of gonna and ain’t was highlighted as a misrepresentation of ‘real-life’ English language use at the time of the Schulenglisch: A multi-dimensional model 7 study. Mindt subsequently applied this method to examine the treatment of catenative verbs (1995; 2000) and, here too, reported that some of the most frequent - and consequently likely communicatively useful - verb constructions were absent from school grammars (e.g., need, begin, continue, appear, tend, fail, etc.). Inspired by this data-driven approach, Schlüter (2002) conducted a book-length corpus-driven analysis of the present perfect. In contrast to Mindt’s earlier works, this empirical grammar was derived on the basis of a native-speaker corpus consisting of both spoken and written English. This empirical grammar of the present perfect was contrasted to traditional, introspection-based grammars, as well as to the grammar sections of two popular series of secondary school EFL textbooks used in Germany, together with their accompanying grammar and activity books. Schlüter (2002) concluded that the textbooks presented the functions of the present perfect in substantially different ways. For example, the examined textbooks failed to explain that the present perfect progressive is often used to refer to iterative actions or events, focusing almost exclusively on its continuous function. The third book-length exploration of Schulenglisch to date was conducted by Römer (2005) who examined how the progressive is represented in the dialogues of two popular textbook series also designed for German secondary schools. Unlike Mindt’s earlier analyses which tacitly assumed that all textbook language ought to represent spoken English, Römer (2005) only examined occurrences of the progressives in the textbook passages explicitly intended to reflect spoken language use (printed dialogues, speech bubbles, transcripts of audio materials, etc.). By comparing these with how the progressive is used in everyday conversation among L1 speakers, her study is one of the few investigations of the language of textbooks to date that accounts for the fact that modality and register are likely to impact how such a grammatical construction is used in context. Among other findings, Römer (2005: 244-5) reported that contracted forms of the auxiliary BE are underrepresented among the progressive forms encountered in the textbook dialogues of her textbook corpus. Concerning the core functions of the progressive, Römer (2005: 260-6) noted that repeated actions or events in the progressive are underrepresented in textbook dialogues. In a conceptual replication study, Le Foll (2022a; 2022b: chap. 4) confirmed that the most noteworthy differences that Römer (2005) observed between the use of the progressive in German EFL textbook dialogues and natural spoken English remain problematic in recent German EFL textbooks. Using the same comparative corpus-based methodology (and partly the same data), Römer also compared the frequencies, co-occurrence patterns and functions of modal verbs (2004a) and if-conditionals (2004b) in authentic spoken British English and German secondary school EFL textbooks, hereby reporting many striking differences, too. For instance, Elen Le Foll 8 Römer (2004a) concluded that textbook conversation is characterised by the use of modals that most frequently refer to ability rather than the many other functions that they also fulfil in naturally occurring conversation. In Römer (2004b) the three tense sequences that correspond to what EFL textbooks and grammar books usually refer to as ‘Type 1’, ‘Type 2’ and ‘Type 3’ conditionals were found to be vastly overrepresented in textbook conversation as compared to naturally occurring spoken English. Conversely, Römer (2004b: 159-60) demonstrated that the most frequent tense combinations in if-sentences in the spoken component of the BNC2014 (simple present + simple present) as well as several other frequent tense sequences were significantly underrepresented in the texts of German EFL textbooks. These results have since largely been corroborated by Möller (2020) and Winter and Le Foll (2022) with more recent EFL textbook publications. 2.2. The lexis of Schulenglisch Whilst it is true that most studies of Schulenglisch to date have focused on grammar, corpus-based analyses have also examined the lexis and pragmatics of German EFL textbooks. For instance, Siepmann (2014) compared the phrasemes featured in the vocabulary sections of two series of secondary school EFL textbooks with a revised version of Martinez & Schmitt’s (2012) list of the most frequent “non-transparent phrasemes” found in the BNC1994. Fewer than a fifth of the phrasemes of the revised corpus-driven list were featured at least once in each textbook series. Siepmann (2014) concludes that the selection of phrases is therefore not based on frequency or, more worryingly, seemingly on any other systematic criteria. Strikingly, it was also not the case that the number of phrasemes featured in the examined textbooks increased as students’ English proficiency was expected to improve. 2.3. The pragmatics of Schulenglisch Among the few studies that have begun to explore the pragmatics of Schulenglisch, Limberg’s analyses of markers of politeness (2016a) and apologies (2016b) in German secondary school EFL textbooks both testify to a lack of systematic, recurrent treatment of these pragmatic phenomena. Ways to communicate politeness and to apologise are only rarely explicitly taught or featured in textbook tasks and activities. The few words and phrases that are taught tend to be presented with very little context resulting in learners not being sufficiently familiarised with the socio-cultural constraints of these lexical units (Limberg 2016a: 286-7). Schulenglisch: A multi-dimensional model 9 3. Research questions In sum, numerous studies have pointed to major differences between how individual linguistic features are represented in Schulenglisch - as captured in EFL textbooks designed for secondary education in Germany - compared to various forms of ‘authentic’, ‘natural’ or ‘real-life’ English. However, to date, no study has attempted to model and provide empirical evidence for the similarities and differences between Schulenglisch and ‘real-life’ English a) across a broad range of linguistic features and b) taking account of systematic variation within Schulenglisch. This study aims to identify and describe the specificities of the language of EFL textbooks used at lower secondary school level in Germany using statistical methods that can model for the potential effects and interactions of textbook register, series, and proficiency levels across many different linguistic features. In doing so, it seeks answers to the following research questions: 1. To what extent does Schulenglisch vary across different: a) text registers b) textbook proficiency levels and c) textbook series? 2. How does Schulenglisch differ from the kind of naturally occurring English that learners can be expected to encounter and use outside the EFL classroom? 4. Data and methods This section begins with a description of the school EFL textbook corpus and the three reference corpora employed in this study. Multi-feature/ multi-dimensional analysis (MDA) is then introduced as a method for exploring differences and similarities across a wide range of linguistic features before outlining the modifications made to the traditional MDA framework for the present study. 4.1. Corpus description and processing It is fair to say that, by today’s standards, the reference corpora from which Mindt derived his empirical grammars (see 2.1) are rather small. Moreover, although Mindt’s analyses claimed to focus on representations of spoken English, the textbook language examined mostly consisted of written registers. Indeed, the transcripts and the listening exercises associated with the textbooks were not included in the analyses (Mindt 1987: 53). The choice of fictional texts (novels, plays and films) as reference corpora for so-called ‘authentic English’ may also be called into question. The present study aims to address these issues by analysing a corpus of textbooks that Elen Le Foll 10 includes the transcripts of audio and video materials (see 4.1.1) and by relying on larger reference corpora that correspond to the kind of language that German EFL learners can be expected to interact with outside the school classroom (see 4.1.2). 4.1.1. Textbook English Corpus (TEC) In the present study, Schulenglisch data stems from the German subcorpus of the Textbook English Corpus (TEC) (Le Foll 2021a; 2022b). The TEC is made up of all the texts printed in 43 EFL coursebooks used in secondary schools in France, Germany and Spain, as well as the transcripts of their accompanying audio and video materials. To enable comparisons across different educational systems, their target English proficiency levels have been labelled from A to E; with A corresponding to the first year of EFL instruction at secondary school, and E to the fifth year. The German subcorpus of the TEC comprises 15 textbooks from three textbook series commonly used in Germany (see Table 1). The three series largely follow a content-based, integrated-skills syllabus with each chapter and/ or unit covering a different topic with a range of diverse texts, tasks, and activities. Publisher Textbook series Volume Age group Proficiency level Year of publication Klett Green Line 1 11-12 y. A 2006 2 12-13 y. B 2006 3 13-14 y. C 2007 4 14-15 y. D 2008 5 15-16 y. E 2009 Klett Green Line New 1 11-12 y. A 2014 2 12-13 y. B 2015 3 13-14 y. C 2016 4 14-15 y. D 2017 Schulenglisch: A multi-dimensional model 11 5 15-16 y. E 2018 Cornelsen Access G 1 11-12 y. A 2013 2 12-13 y. B 2014 3 13-14 y. C 2015 4 14-15 y. D 2016 5 15-16 y. E 2017 Table 1. Composition of the German subcorpus of the TEC. The textbooks of the TEC were manually subdivided into text units, where one exercise, reading passage, or transcript corresponds to one text unit. These texts were also annotated for eight major textbook registers: Conversation, Informative writing, Fiction, Personal correspondence (letters, diary entries, social media posts, and e-mails), Instructional (instructions and explanations), Poetry (songs and poems), Other texts (timetables, shopping lists, etc.) and Words & Phrases (e.g., contextless words and sentences from exercises). Defining text units in textbooks is not a trivial task. Numerous possibilities arise (see Le Foll 2020). One major issue is that many textbook texts are too short for the normalised frequency of most linguistic features to be reliable. Up until now, entire textbook series or volumes have often been conceived as single texts. However, such an approach cannot account for intra-textbook variation. Whether or not a text is of an appropriate length for normalised frequencies to be meaningful depends on the expected frequency of the least frequent linguistic features. For the present study (as in Le Foll 2022b and following Biber 1988), a minimum length of 400 words per text was chosen. This was necessary to ensure that less frequent linguistic features have a reasonable chance of occurring within a text unit and thus obtain comparable feature frequencies. Shorter texts within each textbook volume and register were therefore collated into longer text files. This means that, for example, short, consecutive instructional texts from any one textbook volume were combined until a total word count of at least 400 words was reached. This task was performed sequentially within each textbook volume so that short files of the same register and from within a chapter/ unit or across directly adjacent chapters/ units were merged. As a result, the progression that the learners are expected to make is retained in the collated text files. Elen Le Foll 12 TEC texts classified as Words & Phrases were excluded from the present analyses as this study focuses on coherent texts only. Poetry and Other texts also had to be excluded as there were too few of them in the German subcorpus of the TEC for inclusion in the present multivariate analysis. Following these data preparation steps, 804 textbook texts (hereafter collectively referred to as the TEC-Ger, see Table 2) were analysed. Textbook Register Number of texts Number of words Conversation 267 244,992 Fiction 194 166,914 Informative 92 79,012 Instructional 217 195,387 Personal Correspondence 34 27,128 Total 804 713,433 Table 2. Composition of the TEC-Ger. 4.1.2. Target Language Reference Corpora The German Core Curriculum stresses the need for learners of English to learn to deal with “authentic” texts, in particular in listening and reading comprehension (Kultusministerkonferenz 2012: 12, 15, 18). Up until recently, there was no doubt that ‘authenticity’ was defined by nativespeaker norms. Today, the German Education Standard for the general higher education entrance level (Abitur) states that: Sprachlicher Orientierungspunkt sind Standardsprache(n) sowie Register, Varietäten und Akzente, deren Färbung ein Verstehen nicht generell behindert. ‘The linguistic point of reference is standard language(s), as well as registers, varieties and accents, whose distinctiveness do not generally impede comprehension.’ (Kultusministerkonferenz 2012: 14 emphases added) In reality, however, “standard English” typically amounts to either a British or a US-American English norm. Whilst acknowledging the plurality of different “registers, varieties and accents,” only ‘standard English’ in the German ELT context is associated with the notion of ‘correctness’: Schulenglisch: A multi-dimensional model 13 Die Entwicklung der funktionalen kommunikativen Kompetenzen ist bezogen auf die geläufige und korrekte Verfügung über die sprachlichen Mittel in den Bereichen: Aussprache und Intonation, Orthographie, Wortschatz, Grammatik. ‘The development of functional communicative competence [in a foreign language] refers to the typical/ frequent and correct use of linguistic features in the areas of: pronunciation and intonation, spelling, vocabulary and grammar.’ (Kultusministerkonferenz 2003: 9; emphases added) Thus, despite not (officially) adhering to any (specific) native-speaker norm(s), the objectives set out by German educational authorities stipulate that pupils are expected to be taught “correct,” “typical,” and “frequent” English forms. Whilst measures of correctness necessarily involve some subjective judgements, objective measures of typicality and frequency of occurrence in English as it occurs naturally outside the EFL classroom can be made on the basis of corpus data. At the same time, it is clear that such measures of frequency and typicality will differ depending on the situational context of language use. To answer the second research question, this study focuses on three major Schulenglisch registers: Conversation, Fiction, and Informative texts. To this end, it compares these three register subcorpora of the TEC-Ger with reference corpora of situationally similar target language registers. The following section briefly outlines the composition of these reference corpora. The TEC-Ger Conversation subcorpus is compared to the Spoken BNC2014, an 11.4-million-word corpus of 1,251 orthographically transcribed conversations among L1 speakers in the UK (Love et al. 2017). The Spoken BNC2014 is rich in metadata and has been manually anonymised. For the present study, all mark-ups have been eliminated and anonymising tags replaced with placeholders of the corresponding word class (see Le Foll 2022b for details). The TEC-Ger Fiction subcorpus is compared to the Youth Fiction corpus, which consists of 300 (mostly contemporary) novels targeted at teenagers and young adults (Le Foll 2022b). For the present study, four random samples of approximately 5,000 words were extracted from each of these 300 books (splitting was performed at sentence boundaries, hence the slightly varying word counts), except for three short stories, which were only sampled once each in full. With a total of 1,191 Youth Fiction texts, this procedure resulted in a number of texts comparable to that of the Spoken BNC2014. The Informative Texts for Teens corpus (hereafter Info Teens) was compiled by first retrieving over 10,000 texts from 14 popular web domains of news and information specially targeted at English-speaking teenagers. Care was taken to include a broad range of topics including current affairs, science, technology, history, and entertainment. Of these, 4,895 text files were under 400 words and were thus discarded. Following a stratified sampling approach, 100 texts from each web domain were randomly selected Elen Le Foll 14 from the remaining texts. This number was chosen to approximately match the number of texts in the other two reference corpora. Fewer than 100 texts longer than 400 words were retrieved from three domains; for these, the full domain subcorpora were retained (see Table 3). Domain name Number of texts Number of words bbc.co.uk/ history 100 74,722 dogonews.com 100 60,762 ducksters.com 100 67,894 encyclopdia.kids.net.au 100 74,566 factmonster.com 100 60,395 historyforkids.net 100 71,955 quatr.us 100 62,254 revisionworld.com (GCSE only) 97 74,301 sciencekids.co.nz 100 57,097 Sciencenewsforstudents.org 100 82,258 teen.wng.org 85 45,515 teenkidsnews.com 100 81,765 teenvogue.com 100 82,117 tweentribune.com 29 26,166 whyfiles.org 100 85,492 Total 1,411 1,007,259 Table 3. Composition of the Info Teens corpus. 4.2. Multivariate analysis method 4.2.1. Multi-feature/ multi-dimensional analysis (MDA) The present study explores Schulenglisch using a modified version of Biber’s (1988) multi-feature/ multi-dimensional analysis (MDA) framework. The traditional MDA framework relies on exploratory factor analysis (EFA) to reduce the co-occurrence patterns of a large matrix of lexico-grammatical feature counts to a parsimonious set of latent factors. The basic procedure, as described by Biber and Gray (2013: 403; see also Berber Sardinha & Veirano Pinto 2019), involves eight steps: Schulenglisch: A multi-dimensional model 15 1. Corpus design, text collection and processing 2. Identification of the set of linguistic features to be entered in the MDA 3. Development of scripts (i.e., taggers) to automatically identify these features 4. Tagging of all the texts of the corpus 5. Tag counting and normalisation of the feature frequencies 6. Factor analysis of the feature count matrix 7. Calculation of factor scores and comparison of mean factor scores for relevant groups of texts 8. Interpretation of the factors as underlying dimensions of variation In other words, factors represent groups of linguistic features that tend to co-occur. They are understood to represent major ‘dimensions’ of variation. Each text is attributed a score on each dimension and the mean scores of groups of texts are compared to understand the nature of linguistic similarities and differences between groups of texts. In the first published MDA study, Biber (1988) proposed a model of ‘General Written and Spoken English’ with six functional dimensions of variation. It was elaborated based on the co-occurrence patterns of 67 (largely automatically tagged) linguistic features observed in a large corpus covering a broad range of registers, including face-to-face conversation, press reports, official documents, novels, and letters. The first four dimensions were labelled as: 1. Involved vs. Informational Production 2. Narrative vs. Non-narrative Concerns 3. Situation-dependent vs. Elaborated reference 4. Overt expression of argumentation This pioneering study inspired many subsequent MDA studies exploring functional variation in a range of languages, language varieties and specialised registers (see Biber 2019 for an overview). The present study applies a modified MDA framework to describe Schulenglisch as represented in the TEC-Ger (see 4.1.1) and to compare this variety of English with three reference corpora representing ‘real-life’, extra-curricular English (see 4.1.2). 4.2.2. The feature matrix The texts of the TEC-Ger and of the three reference corpora were tagged using the MFTE Perl (Le Foll 2021c), which tags and counts over 80 different lexico-grammatical features ranging from question tags to the perfect aspect and including semantic features such as verbs of communication and time adverbials. The tagger’s accuracy was formally tested in Le Foll (2021c). The MFTE outputs two tables of normalised frequencies per feature and text. Following the modified MDA framework (Le Foll to appear: chap. 5) applied in the present study, the tagger’s ‘complex normalisation’ Elen Le Foll 16 output was used. In this matrix of counts, no blanket per-word normalisation basis is applied. Instead, many feature counts are normalised per 100 finite verb phrases (FVPs; e.g., present tense) or 100 nouns (e.g., attributive adjectives) (see Appendix 2). Whilst it is not the standard approach in MDA studies, this kind of normalisation ensures that texts of different lengths can be compared whilst reducing the potential for grammatically induced correlations such as between the number of finite verbs and present tense occurrences or verbal contractions. Indeed, the risk with blanket wordbased normalisations is that we obtain dimensions that essentially only distinguish between texts with many finite verbs as opposed to those with relatively few, or dimensions that reflect little more than the relative frequencies of nouns (see Le Foll 2022b: 281-4). In addition to normalisation, and following Biber (1988: 94) the counts were also standardised to z-scores to avoid more common linguistic features exerting undue influences on the model. Finally, the standardised normalised feature frequencies were transformed (with a signed log transformation) to partially compensate for their often highly skewed distributions (cf. Neumann & Evert 2021: 155). It can be argued that such a transformation makes the interpretation of the results more difficult; however, the traditional MDA framework already applies z-standardisation which also represents a trade-off between interpretability and robustness. Experiments on similar linguistic data suggest that signed log transformation leads to more robust results as it reduces the influence of outlier texts on the multivariate analysis. Four features tagged by the MFTE were excluded as they were found to be absent from more than a third of texts. Some features tagged by the MFTE also had to be merged with functionally related features to form more general categories due to low communalities ( ≤ 0.20). This led to the merging of a) the BE -able-to construction with predicative adjectives, b) the BE and GET -passives categories, c) singular and plural third-person references, and d) time and frequency adverbials. As a result, the final matrix of counts included signed log standardised normalised frequencies for 73 linguistic features. The full list of features is provided in Appendix 2. 4.2.3. Principal component analysis MDA relies on a multivariate statistical method to explore differences between texts and groups of texts in a multi-dimensional feature space. In a methodological synthesis of MDA studies, Goulart & Wood (2021: 124) report that, to date, this has most commonly been achieved with exploratory factor analysis (EFA). The present study, however, employs principal component analysis (PCA), as did, e.g., Biber & Egbert (2016; 2018) and Neumann & Evert (2021). Whilst EFA aims to identify latent variables by partitioning shared variance from unique and error variance, PCA focuses Schulenglisch: A multi-dimensional model 17 solely on reducing data dimensionality (Loewen & Gonulal 2015). As a result, EFA is assumed to offer a higher degree of generalisability to other, unsampled variables (Velicer & Jackson 1990a: 17). That said, studies based on real and simulated datasets have demonstrated that EFA and PCA produce very similar results under most conditions (see Velicer & Jackson 1990a; 1990b for summaries). The degree of similarity increases with larger sample sizes and higher factor saturation (Velicer & Jackson 1990a: 6). One critical factor that can lead to consequential differences, however, is the extraction of too many factors or components (“over-factoring”; Velicer & Jackson 1990a: 10). Schönemann (1990: 47) concludes that the methods are “virtually indistinguishable” as long as the same rotation is used, the same number of factors/ components are retained, and the number of retained factors/ components is small relative to the total number of observed variables. The latter is typical of corpus-linguistic studies that tend to have far more variables than the PCAs and EFAs typically conducted in other disciplines, e.g., in psychology for scale and questionnaire validation. Given that the two methods are likely to yield highly similar results with this study’s large sample size (4,657 texts) and number of variables (73 linguistic features), PCA was chosen over EFA for its greater transparency and computational stability (see Le Foll 2022b: 284-6). The programming environment R was used to run the PCA and visualise the results (for details of all the R packages, functions and parameters used, see Appendix 3). 4.2.4. Computing and comparing dimension scores Step 7 of the MDA framework (see 4.2.1) involves computing a factor/ dimension score for each dimension and each text. Features with factor loadings below a pre-determined cut-off point are usually excluded from the computation of their respective factor scores. Additionally, if a feature loads onto more than one factor with a loading above this cut-off point, it only contributes to the factor on which it has the highest loading. In many MDA studies (e.g., Biber 1988), dimension scores are calculated by adding the standardised normalised frequencies for each of the salient positiveloading features and subtracting the salient negative-loading ones. This equates to a dichotomisation of feature factor loadings since all features that have loadings higher (in absolute terms) than the chosen cut-off point make equal contributions to the dimension scores, whilst those that are excluded from the dimension scores do not contribute at all. It has been argued that such unit-weighing approaches to calculating dimension scores can make scores more robust by mitigating error propagation (see Grice 2001: 67-68). However, this study, like Bohmann (2019: 91), adopts an exact scoring method. Dimension scores are calculated by multiplying the standardised Elen Le Foll 18 normalised feature frequencies of any one text by their respective component loadings on that dimension (which may be positive or negative) and adding all these values. It is therefore no longer necessary to remove lowloading features given that, following this approach, they only make very small, often practically insignificant contributions to dimension scores. Grice’s (2001) simulations on datasets from published psychology studies suggest that exact scoring methods such as this one tend to generate more valid and less biased results than unit-weighing approaches. This approach also has the advantage of allowing features to load on multiple dimensions. This is important given that cross-loading features are very common in MDAs and that most linguistic phenomena are known to be highly correlated. To compare different registers on any one dimension, the mean dimension scores of all the texts in any one register are calculated. Such comparisons have typically been quantified and tested for statistical significance using linear regression models (most often with just one predictor in the form of ANOVAs) and their associated coefficients of determination (e.g., Berber Sardinha & Veirano Pinto 2019: 6; Biber 1988: 95; Kruger & van Rooy 2018: 244; Bohmann 2019: 188-90; 2021). A crucial assumption of such models is that the data points be independent of each other. In the context of the present MDA, this assumption is, however, not met. Each of the textbook series of the TEC-Ger has largely been written by the same group of authors. The texts of the TEC-Ger are therefore not truly independent. Similarly, the Youth Fiction and the Info Teens corpora are made up of several samples from any one novel or web domain (see 4.1.2). Hence, in order to quantify group-level patterns across unbalanced group categories and account for variation inherent to the non-independence of some of the texts, the dimension scores of the Conversation, Fiction and Informative texts of the TEC-Ger were compared to those of the three reference corpora using linear mixed-effects models with the R package lme4 (Bates et al. 2015; see Candarli 2022 for a similar approach). For the random effect structure, the categorical metadata variable ‘Source’ was created. It features three levels corresponding to the three textbook series of the TEC-Ger, 300 book levels for Youth Fiction, 14 web domain levels for Info Teens, and one level for the Spoken BNC2014. These categories were chosen as the best-available proxies to capture the variation inherent to each (group of) author(s)/ editor(s) (see Le Foll 2022b: 215-9). For each dimension, a maximal model with the fixed effects of Level (i.e., textbook levels A, B, C, D and E and Reference corpus), register (i.e., Conversation, Fiction, and Informative) and their two-way interactions, and of Source as a random effect was first computed. Subsequently, backward model selection based on the Akaike information criterion (an estimator of prediction error which takes account of both goodness of fit and model complexity) was performed (as explained in Zuur et al. 2009: 120-8). Model diagnostic Schulenglisch: A multi-dimensional model 19 plots were also inspected to check the assumptions of linearity, homogeneity of variance, and the normal distribution of residuals of the models (see Appendix 3 for data, code, and full details of the analyses). 4.3. Results PCA was applied to the matrix of 4,657 texts (i.e., the 804 texts of the TEC- Ger, 1,411 from the Info Teens, 1,251 from the Spoken BNC2014 and 1,191 from the Youth Fiction corpus) by 73 feature (see Appendix 2) frequencies, all zand signed log-transformed as outlined in 4.2.2. The matrix’s overall KMO factor adequacy index of 0.93 indicated that the data is highly suitable for this type of analysis (Kaiser & Rice 1974: 112). To determine the number of components to explore, a scree plot was generated (see Fig. 1). It shows the amount of variance each component captures. Following the “scree test” method (see, e.g., Costello & Osborne 2005: 3), the first three components were retained for further analysis. Together, they account for 46% of the total variance of the matrix. Fig. 1. Scree plot of the eigenvalues of the principal components (PCs). Elen Le Foll 20 Fig. 2. Projections of the texts of the TEC-Ger and the three reference copora on the three dimensions of the model (PC1, PC2 and PC3) (see Appendix 4 for 3-D version). Fig. 2 visualises the position of the texts of the corpora on the three dimensions of the model. In this plot and all subsequent scatterplots, each point represents the position of a single text: textbook texts are denoted with filled circles, whereas triangles represent the texts of the reference corpora. The colours of the points correspond to the texts’ register category and subcorpus. The closer two texts are, the more linguistic similarities they share. The first finding to emerge from this three-dimensional plot is that register-based variation is considerably less marked in the texts of the TEC-Ger than across the three reference corpora. Indeed, whilst the first three dimensions reveal well-defined clusters for the reference corpora (red, bright green and dark blue triangles) - with hardly any overlap -, the five textbook registers overlap considerably. The only textbook register with a very distinct linguistic profile is that of instructional language (in orange) - as highlighted on PC3. Fig. 2 also suggests that the TEC-Ger Fiction texts and those of the Youth Fiction corpus share many linguistic similarities. Though to a lesser extent, this is also true of the Informative textbook texts and the Info Teens corpus. By contrast, there are far fewer overlaps between the clusters of the TEC-Ger Conversation and the Spoken BNC2014 on all three dimensions. This suggests a considerable gap between natural conversational English and how spoken English is taught in Schulenglisch: A multi-dimensional model 21 German schools. These comparisons will be explored in more depth in the following. To this end, mixed-effects models of the dimensions scores are used to tease out the extent to which different factors drive language variation on each of the model’s three dimensions of linguistic variation. Together, the first two dimensions (PC1 and PC2) explain 39% of the total variance of the feature matrix. Fig. 3 shows that, whilst all five textbook register ellipses (representing 90% confidence intervals) are concentrated in the middle of the plot, those of the three target reference corpora are clearly separated from one another. Nonetheless, it hints at some register-based variation within textbooks, too. This is particularly true of the first dimension (PC1), which represents a continuum between spontaneous real-life spoken interactions (as captured by the Spoken BNC2014) at the negative end of the cline and informationally dense, written texts at the positive end (as captured by the Info Teens corpus). On this dimension, the Conversation and Informative texts of the TEC-Ger are clearly separated. This dimension is very reminiscent of Biber’s (1988) Dimension 1. It can be functionally interpreted as a dimension representing ‘Written informational vs. Spoken interactional’ variation. Similarly, the second dimension (PC2) to emerge from the present PCA echoes Biber’s (1988) Dimension 2. It serves to distinguish between fiction and the other four registers examined. Hence, here, only the negative pole is labelled as: ‘Fictional narrative’. Elen Le Foll 22 Fig. 3. Projection of the texts of the TEC-Ger and the three reference corpora on PC1 and PC2 Schulenglisch: A multi-dimensional model 23 Fig. 4 shows the linguistic features that make the greatest contributions to the first and second dimensions and hence to the position of the texts on Fig. 3. Longer arrows and darker shades represent the strongest contributions. Fig. 4 confirms that the first two dimensions share many similarities with Biber’s (1988) first two dimensions as, on the first dimension, many of the strongest contributing features overlap with those that also have the highest absolute loadings on Biber’s (1988) Dimension 1, e.g., verbal contractions (CONT) and discourse markers (DMA) at the ‘Spoken interactional’ end and nouns (NN), longer average word length (AWL), prepositions (IN) and higher lexical density (LD) towards the ‘Written informational’ end. Similarly, the ‘Narrative’ pole of the second dimension is characterised by high frequencies of past tense finite verb phrases (VBD) and third-person references (TPP3). In addition, Fig. 4 confirms that multiple features make significant contributions to both the first and second dimensions. The position of the ellipse corresponding to the Personal Correspondence texts of the TEC-Ger on Fig. 3 indicates that a large proportion of these texts shares similarities with the Fiction texts from the same corpus, whilst others are more akin to the dialogues of the same textbooks. The least overlap between the ellipse of a target reference corpus and its corresponding TEC-Ger register can be observed at the negative end of the first dimension: the texts of the Spoken BNC2014 score considerably lower than the dialogues of the TEC-Ger on this dimension. Conversely, on average, the informative texts found in contemporary German EFL textbooks score lower on PC1 than the functionally similar texts of the Info Teens corpus. On the second dimension, the TEC-Ger Fiction ellipse is also notably shifted away from the ‘Narrative’ end of the dimension as compared to the ellipse of the target reference Youth Fiction corpus. Together, these observations suggest that, in Schulenglisch, text registers are linguistically less clearly delineated than in the kind of ‘real-life’ English that German EFL learners can be expected to encounter outside the EFL classroom. Elen Le Foll 24 Fig. 4. Biplot of the features with the strongest contributions to PC1 and PC2 (see Appendix 2 for key to feature abbreviations). A model predictig PC1 scores with Corpus (two levels: TEC-Ger and Reference), Register (three levels: Conversation, Fiction, and Informative) and their interactions as independent variables explains 88% of the variance (marginal R 2 ) in PC1 scores across these six (sub)corpora. Interestingly, model comparisons show that the variation observable along PC1 is not significantly mediated by the textbooks’ advertised proficiency levels. For the same model, the conditional R 2 value, which additionally accounts for variation due to Source, explains as much as 94% of the total variance. Within the random effect structure, however, the coefficient estimates of the three textbook series are negligible: 0.16 [95% CI: 0.07-0.25] for Access G, -0.07 [95% CI: -0.19-0.05] for Green Line and -0.01 [95% CI: -0.20- 0.00] for Green Line New (see Appendix 3). This suggests that hardly any of the linguistic variation observable along the first dimension is due to Schulenglisch: A multi-dimensional model 25 any systematic differences between the three textbook series under examination or the proficiency level for which the textbooks were designed. Crucially, this also means that the substantial gap between the Conversation texts of the TEC-Ger and natural conversation (as captured in the Spoken BNC2014) observed on the first dimension is consistent across different textbook series and textbook proficiency levels. Similarly, a large proportion of informative texts in German EFL textbooks are, across all proficiency levels, very different to those of the Info Teens corpus. We can therefore conclude that variation on Dimension 1 is driven, foremost, by text register and, second, by differences between Schulenglisch and extra-curricular English, rather than by any idiosyncrasies of textbook authors/ editors, or as a result of varying textbook proficiency levels. Fig. 5. Scores on Dimension 2 for the Conversation, Fiction, and Informative texts of the TEC-Ger and the three corresponding reference corpora (Spoken BNC2014, Youth Fiction, and Info Teens) as predicted by the model: lm(PC2 ~1 + Level: Register + (1|Source), data = dim.scores). Horizontal lines represent mean predicted scores. Elen Le Foll 26 This is not true of the second ‘Narrative’ dimension, where proficiency level is found to be a significant predictor of PC2 scores. That said, as illustrated in Fig. 5 (see also model summaries and comparisons in Appendix 3), this effect is almost entirely driven by the interaction between the Fiction register and Level A (beginner) textbooks. The large gaps between both the Spoken BNC2014 and Level A Conversation texts, on the one hand, and the Youth Fiction corpus and Level A Fiction texts, on the other, are largely due to the fact that past tense is not generally being introduced until the end of the first year of EFL tuition. The linguistic features that make the greatest contributions to these differences between the Fiction texts of the TEC-Ger and those of the Youth Fiction corpus are plotted in Fig. 6. Here, each data point represents the frequency of the linguistic feature (as normalised by the MFTE; see 4.2.2.) in a single text. The grey points correspond to the relative frequencies of the texts of the Youth Fiction corpus, whilst those of the TEC-Ger Fiction are in colour and are subdivided by textbook proficiency level (A to E, see Table 1). The first row of boxplots in Fig. 6 illustrates the main difference between beginner (Level A) Fiction and the remaining Fiction texts of the TEC-Ger: most Level A Fiction texts feature present-tense (see (6) for an example), as opposed to past tense narration, as commonly found in most fictional texts (e.g., (7)). Since the present-tense variable contributes to positive scores on PC2, whilst past tense contributes to negative ones (see Fig. 4) this is the main reason why level A Fiction texts score considerably higher on PC2 than the remaining fictional texts from the TEC-Ger. In addition, Fig. 6 reminds us that the modals could and would, as well as the perfect aspect, are not introduced until the third year of secondary school tuition in German schools. As these features also contribute to lower PC2 scores, their absence in beginner level textbook texts further contributes to the only partial overlap between the TEC-Ger Fiction and the Youth Fiction ellipses on Fig. 3. Schulenglisch: A multi-dimensional model 27 Fig. 6. Normalised frequencies of the features that make the greatest contributions to negative Dimension 2 scores in the texts of the TEC-Ger Fiction (subdivided by proficiency levels A to E) and the Youth Fiction reference corpus (Ref.). Elen Le Foll 28 On average, the Fiction texts of the TEC-Ger also feature lower frequencies of downtoners, general adverbs, particles and occurrences of the perfect aspect, which are four additional features that characterise the texts of the Youth Fiction corpus on PC2, see (7). (6) Dick helps the cook in the kitchen and then Dick wants to sleep. The cook shows him a bed in the kitchen. But there are mice there, and Dick can’t sleep. In the morning Dick goes to a shop and takes four bags back to the house. <TEC-Ger: Green Line 1> 2 (7) The pears and butter went into the saucepan. Never mind the syrup spilled on the floor. “Beezus! ” Ramona held up the package of peas. Beezus groaned. Out came the partially cooked chicken while she stirred the thawing peas into the yoghurt and shoved the dish back into the oven. The rice! They had forgotten the rice, which was only beginning to stick to the pan. <Youth Fiction: Beverly (1981): Ramona Quimby, Age 8> In interpreting the results of the three-dimensional plot in Fig. 2, we already noted that the third dimension (PC3) foremost highlights the unique nature of the instructional texts of the TEC-Ger. This is also evident from the projection of texts on the first and third dimensions in Fig. 7, on which instructional textbook texts clearly score lowest, with little overlap with other registers. 2 In all corpus examples, the linguistic features discussed in the main text have been highlighted in bold. Schulenglisch: A multi-dimensional model 29 Fig. 7. Projection of the texts of the TEC-Ger and the three reference corpora on PC1 and PC3. Elen Le Foll 30 The features that contribute most to this highly distinctive linguistic profile are, in descending order of their contributions to negative PC3 scores: WHquestions, imperative verbs, second-person references, communication verbs, the modal can, and yes-no-questions, e.g.: (8) Swap your cards with another group. Check their cards. Do all the action cards go with the name cards? Are the verb forms and spelling right? Write corrections and give them back to the group. […] You can use these phrases: Whose turn is it? It’s my/ your/ Maria’s turn. <TEC-Ger: Access G 1> In addition to revealing the linguistic distinctiveness of instructions and explanations in German EFL textbooks, Fig. 7 also suggests that PC3 represents a ‘Didactised vs. Real-life English’ dimension as the ellipses of the TEC-Ger are all shifted towards the negative end of the dimension as compared to those of the three reference corpora which, unlike on PC1 and PC2, are all located within the same region on this dimension. Comparisons of mixed-effects models for PC3 scores showed that, whilst Register and Level as independent variables both contribute to more accurate prediction of PC3 scores, their two-way interactions are not significant. As shown in the visualisation of predicted scores in Fig. 8, the tendency for TEC-Ger texts to be located towards the negative end of this third dimension is strongly associated with the textbooks’ targeted proficiency levels. Thus, as learners are expected to become more proficient, PC3 scores increase. However, for all three registers depicted om Fig. 8, even the most advanced texts of the TEC-Ger score, on average, somewhat lower on this third dimension than the texts of the reference corpora. Hence, this third dimension appears to capture both proficiency-level based linguistic variation, as well as more generalisable differences between Schulenglisch and extra-curricular English. That said, the random effect variable Source also plays an important role in predicting PC3 scores; in particular, PC3 scores differ considerably between individual books and web domains, resulting in numerous outlier texts outside the Youth Fiction and Info Teens ellipses on Fig. 7. Systematic differences in PC3 scores associated with textbook series, however, are far more modest. On average, Green Line New texts tend to be slightly less shifted towards the extra-curricular English side of Dimension 3 than Access G and the previous edition of Green Line (see Appendix 3 for details). Schulenglisch: A multi-dimensional model 31 Fig. 8. Predicted PC3 scores of the Conversation, Fiction, and Informative texts of the TEC Ger and the three corresponding reference corpora (Spoken BNC2014, Youth Fiction, and Info Teens). An analysis of the features that contribute to low PC3 scores indicates that, across all registers, Schulenglisch tends to feature more second-person references, yes-no (YNQU) and WH-questions, and politeness markers (e.g., (9)). In addition, a comparison of negativevs. positive-loading features on PC3 strongly suggests that Schulenglisch relies heavily on the modal can at the expense of all other modal verbs (MDNE, MDMM, MDWS, MDWO, MDCO), which are considerably less frequent in the texts of the TEC-Ger than in extra-curricular English. In general, Schulenglisch is more strongly characterised by the features it largely lacks or underrepresents as compared to naturally occurring English. These include features associated with more complex grammatical constructions such as split auxiliaries, that-subordinate clauses, concessives, passive constructions and, to a lesser extent, that-omission and that-relative clauses. Gradually introducing EFL learners to these features makes pedagogical sense. Encouragingly, the results of the mixed-effects models suggest that this is the intention of the Elen Le Foll 32 textbook authors given that, as shown in Fig. 8, PC3 scores progressively increase as textbook proficiency level increases. (9) Can I help you? Yes, have you got the new ‘Pets’ magazine, please? I can’t find it. It’s there next to the sports magazines. Excuse me. Where can I try on this sweatshirt? There, on the left. Thanks. I like the colour, but the size isn’t right. No problem. We’ve got other sizes, too. <TEC-Ger: Green Line 1> 5. Discussion The first research question focused on intra-textbook variation. It asked: To what extent does Schulenglisch vary across different: (a) text registers (b) textbook proficiency levels and (c) textbook series? The results of the present study demonstrate that, like all language varieties, Schulenglisch is not monolithic but rather varies in several systematic ways. On the first dimension of linguistic variation that emerged from this PCA-based MDA (and which explains most of the variance in the data), text register represents by far the most important source of variation in Schulenglisch. This is also true of the second dimension which differentiates between the Fiction register and all other Schulenglisch registers, with some overlap between Personal Correspondence and Fiction. On the third, ‘Didactised vs. Real-life English’ dimension, by contrast, textbook proficiency level explains most of the variance in the PC3 scores of the texts of the TEC-Ger. This is largely due to this dimension being characterised by linguistic features that are only gradually introduced to German EFL learners. In addition, the third dimension highlights instructional texts as a very distinct form of language use. With this in mind, it is worth reflecting on the fact that a large proportion of secondary school learners’ classroom-based English input consists of instructional language. It represents over a quarter of the total word counts of German school EFL textbooks (see 4.1.1). Textbook proficiency level is also a driver of linguistic variation on the second dimension. However, here, the effect is almost dichotomised: only the differences observed between Level A (first year of secondary school) and the remaining levels are significant. In contrast to register, proficiency level, and their interactions, which were shown to mediate linguistic variation in Schulenglisch to various degrees, the analyses show that Schulenglisch is remarkably homogenous Schulenglisch: A multi-dimensional model 33 across the three series of the TEC-Ger on all three dimensions of the models. It would thus seem that the kind of Schulenglisch that EFL learners in Germany are exposed to is highly comparable regardless of the textbook. It goes without saying that the latter observation is highly tentative and that future research ought to analyse the language of other widely-used textbook series before it can be confirmed. The second research question asked how Schulenglisch differs from ‘reallife’, extra-curricular English. The first two dimensions of the model (see Fig. 3) show that Schulenglisch as a variety of English occupies a subspace of the register variation observed in ‘real-life’ English. As illustrated by the overlaps of textbook register ellipses in Fig. 2, a defining characteristic of Schulenglisch appears to be that, apart from instructional texts which form a very distinct register, its registers are linguistically less clearly defined than in natural English. This underdifferentiation of register in Schulenglisch is potentially problematic as it could mislead learners into thinking that English hardly varies across different situational contexts of use (see also Rühlemann 2008). The comparative analyses have also highlighted similarities and differences between the Conversation, Fiction, and Informative texts of the TEC-Ger and their corresponding target reference corpora. In particular, representations of conversational English in the TEC-Ger were found to differ significantly from natural conversation, as captured in the Spoken BNC2014. Previous studies have shown that this is also the case in the French and Spanish subcorpora of the TEC (Le Foll 2021a; 2022b; to appear). Although some may argue that transcripts of informal conversations in the UK may not be the ideal corpus to represent the spoken target norm for German secondary school EFL learners (see 4.1.2), such systematic and generalised linguistic differences arguably risk impeding learners’ development of spoken interactional competences. Indeed, this study has shown that textbook dialogues (which include the transcripts of the audio and video materials associated with the textbooks, see 4.1.1) lack many of the features that are characteristic of spontaneous, interactional language, such as discourse markers, fillers and interjections, emphatics, yes/ no and tag questions, as well as features that are particularly useful when interlocutors share common knowledge and a common environment, e.g., demonstratives, quantifiers and it-pronouns, see (10). Additionally, representations of spoken English in Schulenglisch are much more nominal and thus feature more prepositions, longer words, and higher type/ token ratios, which may result in unrealistic speech models, e.g., (11). (10) I believe in science though yeah I mean that’s kind of what I believe as well […] exactly I’m not saying it’s true I’m saying that’s what that’s why I don’t believe it Elen Le Foll 34 because I was saying it’s possible but I don’t believe in it aha which is what I’m saying which I guess you could say is the same thing as just saying I don’t believe in religion <BNC2014: S8Q3> (11) I’m glad he’s interested in Australia, but next time he should be a bit more careful about what he says. I’ve never believed that all Germans wear Lederhosen and drink beer all day. Maybe he’s got some more real questions about Australia. <TEC-Ger: Green Line 5> Whilst textbook dialogues often display more of the features typical of written, informative texts than natural conversation, many of the informative texts of the TEC-Ger are situated closer to the ‘Spoken interactional’ end of the first dimension than those of the reference Info Teens corpus (see Fig. 3). This is the result of texts which, although presented in the form of informative articles in the textbooks, are in fact characterised by numerous features more typical of spontaneous, conversational English. As illustrated in (12), these include contracted verb forms, second-person references, discourse markers, and yes-no questions: (12) The name “soap opera”, or just “soap”, goes back to radio dramas in the 1930s the commercials were for housewives, and they advertised soap, and other cleaning products. Want to be a star? Want to be discovered? Not so fast! Before you can get anywhere, the programme has to “cast” you first. Have you ever been invited to do a casting? You haven’t? Well, TEENBUZZ tells you all about it. Matt Stirling from EastEnders can give you a few tips, too. First, you talk to an agent and give him or her your photo. <TEC-Ger: Green Line 3> Overall, the Fiction subcorpus of the TEC-Ger was found to be most similar to its corresponding reference corpus. In many respects, this finding is easily explained: unlike dialogues and informative texts which are generally especially crafted for pedagogical purposes, many of the narrative texts featured in German EFL textbooks are extracts of published novels and short stories. The analyses merely highlighted the specific nature of beginner level textbook fiction. Excerpt (13) constitutes a representative example of such a text: it relies on present-tense narration and features high frequencies of BE as a main verb and the modal can. By contrast, excerpt (14) stems from a level E (fifth year of secondary education) text that, on both Fig. 3 and Fig. 7, falls within the ellipses of the Youth Fiction corpus: it features many past tense verbs, third-person references, several phrasal verbs, and a higher type/ token ratio. (13) Holly is at home with her two guinea pigs, Mr Fluff and Honey. They live in the kitchen. But they aren’t in the kitchen now. They’re in Holly’s Schulenglisch: A multi-dimensional model 35 room. It’s fun for the guinea pigs on the floor. They can explore everywhere in the room! […] After Olivia’s visit Holly can only see Honey under her bed. <TEC-Ger: Green Line New 1> (14) He ran back to Bill. On the way he picked up a stick. As he came over the hill, he ran at the boar, hitting it again and again. It turned to face Colm. There was blood on its tusks. Bill tried to pull himself away, leaving a trail of blood behind him. Colm raised the stick high and brought it down on the boar’s head. <TEC-Ger: Access G 5> The third dimension to emerge from the present study uncovered more general, cross-register differences between Schulenglisch and extra-curricular English. Some of these echo findings from previous research. For instance, Römer (2004a) reported that the modal would was the second most frequent modal in the Spoken BNC1994, yet only ranked fifth in her conversational textbook data. In the reference conversational data of the present study, would is often used as a marker of politeness. Interestingly, however, the results show that other, in particular, lexical markers of politeness are more typical of Schulenglisch (cf. Limberg 2016a). These are captured here in the POLITE variable (e.g., apologies, please, thank you; see Appendix 2), which makes a strong contribution to the ‘Didactised’ end of the third dimension. Römer (2004a) also concluded that textbooks overrepresent the ‘ability’ function of modals at the detriment of other uses of modal verbs. This corresponds to the strong contribution of the modal can to the ‘Didactised’ pole of the third dimension. Finally, the overrepresentation of WH-questions in textbook dialogues as compared to ‘real-life’ spoken English was also observed by Römer (2005) in the context of progressives. However, given that the syntax of questions often poses problems to EFL learners, textbooks’ over-representation of yes/ no questions and WH-questions (as observed on the third dimension of the present model) can be argued to serve a well-founded pedagogical aim. 6. Conclusion Using a revised multi-feature/ dimensional analysis (MDA) framework, this study analysed textbook-based Schulenglisch across a wide range of lexical, grammatical, and semantic features (see Appendix 2). In doing so, the language of EFL textbooks used in German secondary schools (see 4.1.1) was described along three dimensions of linguistic variation. These dimensions helped to identify significant sources of internal variation within textbookbased Schulenglisch and exposed systematic patterns of similarities and differences between the dialogues, fiction and informative texts of German EFL textbooks and ‘real-life’, extra-curricular English as used in comparable situational contexts (see 4.1.2). Elen Le Foll 36 Whilst some differences may be pedagogically well-founded (e.g., the use of present-tense narration in beginner fictional texts or the over-representation of question forms), other differences - especially those that were found to be persistent across all target proficiency levels - may deprive learners of classroom-based exposure to important features of ‘real-life’ English. The most concerning gap between Schulenglisch and extra-curricular English was observed in representations of conversational English in German EFL textbooks. This is well illustrated by the distribution of texts on the model’s first dimension, where representations of spoken language in Schulenglisch are notably shifted towards the ‘Written informational’ end of the dimension as compared to natural conversation, as well as on the third dimension, on which many textbook dialogues are closer to textbook instructions and explanations than natural conversation. This finding corroborates commonly held lay beliefs about Schulenglisch such as those expressed in excerpts (5) and (15): (15) Wer im Ausland lebt, stellt schnell fest, dass das Schulenglisch der letzten Jahre herzlich wenig mit der Alltagssprache gemein hat. ‘Anyone who lives abroad quickly realises that the school English taught in recent years has precious little in common with everyday language.’ <deTenTen18: auslandsaufenthalt.org> In conclusion, this comparative, multivariate study has provided valuable insights into the nature of Schulenglisch, confirming certain prevalent beliefs regarding its lack of authenticity and limited situational variation. Encouragingly, some of the observed differences between the variety of English taught in German secondary schools and ‘real-life’, extra-curricular English diminish as learners are expected to become more proficient in English. However, it remains evident that Schulenglisch inadequately captures important distinctions between modes and registers, even at more advanced proficiency levels. As young German EFL learners interact more and more with English-medium media in their extra-curricular activities, there is a risk that this observed and perceived gap between Schulenglisch and ‘real-life’ English (further) alienates them from the learning process. It is therefore hoped that this empirical description of Schulenglisch can help raise awareness of the linguistic specificities of Schulenglisch among textbook authors, editors, and teachers and spur on the development of more corpus-based teaching materials (see, e.g., Friginal & Roberts 2022; Le Foll 2021b) and the inclusion of more authentic materials in the EFL classroom. These materials should reflect the contextual nature of language use while facilitating meaningful pedagogical progressions. The incorporation of (adapted) excerpts of published fictional works originally written for purposes other than language learning in German EFL textbooks demonstrates that - with some modifications for lower proficiency levels - Schulenglisch: A multi-dimensional model 37 this is entirely feasible. Indeed, this study has shown that, in the context of lower secondary school German EFL textbooks, this approach has led to a more accurate representation of fiction than other Schulenglisch registers such as informative writing and, in particular, spoken conversation. Undoubtedly, further empirical research is warranted to investigate the extent to which more accurate representations of language use in different situational contexts can contribute to fostering a more engaging and effective learning experience among secondary school EFL learners. References Bates, Douglas, Martin Mächler, Ben Bolker & Steve Walker (2015). Fitting linear mixed-effects models using lme4. Journal of Statistical Software 67 (1): 1-48. https: / / doi.org/ 10.18637/ jss.v067.i01 Berber Sardinha, Tony & Marcia Veirano Pinto (Eds.). (2019). Multi-Dimensional Analysis: Research Methods and Current Issues. London: Bloomsbury Academic. https: / / doi.org/ 10.5040/ 9781350023857 Biber, Douglas (1988). Variation across Speech and Writing. Cambridge: Cambridge University Press. https: / / doi.org/ 10.1017/ CBO9780511621024 Biber, Douglas (2019). Multi-dimensional analysis: A historical synopsis. In: Tony Berber Sardinha & Marcia Veirano Pinto (Eds.). Multi-Dimensional Analysis: Research Methods and Current Issues. London: Bloomsbury Academic. 11-26. https: / / doi.org/ 10.5040/ 9781350023857 Biber, Douglas & Jesse Egbert (2016). Register variation on the searchable web: A multi-dimensional analysis. Journal of English Linguistics 44 (2): 95-137. https: / / doi.org/ 10.1177/ 0075424216628955 Biber, Douglas & Jesse Egbert (2018). Register Variation Online. Cambridge: Cambridge University Press. https: / / doi.org/ 10.1017/ 9781316388228 Biber, Douglas & Bethany Gray (2013). Identifying multi-dimensional patterns of variation across registers. In: Manfred Krug & Julia Schluter (Eds.). Research Methods in Language Variation and Change. Cambridge: Cambridge University Press: 402-420. https: / / doi.org/ 10.1017/ CBO9780511792519.026 Bohmann, Axel (2019). Variation in English Worldwide: Registers and Global Varieties. Cambridge: Cambridge University Press. https: / / doi.org/ 10.1017/ 9781108751 339 Bohmann, Axel (2021). Register in world Englishes research. In: Britta Schneider, Theresa Heyd & Mario Saraceni (Eds.). Bloomsbury World Englishes Volume 1: Paradigms. London: Bloomsbury Academic. 80-96. Candarli, Duygu (2022). Linguistic characteristics of online academic forum posts across subregisters, L1 backgrounds, and grades. Lingua 267: 103190. https: / / doi.org/ 10.1016/ j.lingua.2021.103190 Costello, Anna B. & Jason Osborne (2005). Best practices in exploratory factor analysis: four recommendations for getting the most from your analysis. Practical Assessment, Research, and Evaluation 10 (7): 1-9. https: / / doi.org/ 10.7275/ JYJ1- 4868 Council of Europe (Ed.). (2020). Common European Framework of Reference for Languages: Learning, Teaching, Assessment. Companion volume. Strasbourg: Council of Europe Publishing. [online] https: / / rm.coe.int/ common-european-framewo rk-of-reference-for-languages-learning-teaching/ 16809ea0d4 [Feb. 2022]. Elen Le Foll 38 Editors of duden.de (n.d.) Schulenglisch. Duden online. [online] https: / / www.duden.de/ rechtschreibung/ Schulenglisch [April 2022]. Freudenstein, Reinhold (2001). Fremdsprachen lernen ohne Lehrbuch: Zur Geschichte, Gegenwart und Zukunft fremdsprachlicher Unterrichtsmaterialien. Praxis des neusprachlichen Unterrichts 48 (1): 8-19. Friginal, Eric & Jennifer Roberts (2022). Corpora for Materials Design. In: Reka R. Jablonkai & Eniko Csomay (Eds.). The Routledge Handbook of Corpora and English Language Teaching and Learning. London: Routledge. 131-146. Goulart, Larissa & Margaret Wood (2021). Methodological synthesis of research using multi-dimensional analysis. Journal of Research Design and Statistics in Linguistics and Communication Science 6 (2): 107-137. https: / / doi.org/ 10.1558/ jr ds.18454 Grau, Maike (2009). Worlds apart? English in German youth cultures and in educational settings. World Englishes 28 (2): 160-174. https: / / doi.org/ 10.1111/ j.1467-971X.2009.01581.x Hessisches Kultusministerium (2010). Lehrplan Englisch: Gymnasialer Bildungsgang, Jahrgangsstufen 5G bis 9G. [online] https: / / kultusministerium.hessen.de / sites/ kultusministerium.hessen.de/ files/ 2021-06/ g8-englisch.pdf [Feb. 2022]. Jakubíček, Miloš, Adam Kilgarriff, Vojtěch Kovář, Pavel Rychlý & Vít Suchomel (2013). The TenTen corpus family. 7 th International Corpus Linguistics Conference (Lancaster University). 125-127. Kaiser, Henry F. & John Rice (1974). Little Jiffy, Mark IV. Educational and Psychological Measurement 34 (1): 111-117. Kruger, Haidee & Bertus van Rooy (2018). Register variation in written contact varieties of English: A multidimensional analysis. English World-Wide. A Journal of Varieties of English 39 (2): 214-242. https: / / doi.org/ 10.1075/ eww.00011.kru Kultusministerkonferenz (2003). Bildungsstandards für die erste Fremdsprache (Englisch/ Französisch) für den Mittleren Schulabschluss. Sekretariat der Ständigen Konferenz der Kultusminister der Länder in der Bundesrepublik Deutschland. [online] https: / / www.kmk.org/ fileadmin/ veroeffentlichungen_beschluesse/ 2003/ 2003_12_04-BS-erste-Fremdsprache.pdf [Jan. 2017]. Kultusministerkonferenz (2012). Bildungsstandards für die fortgeführte Fremdsprache (Englisch/ Französisch) für die Allgemeine Hochschulreife. Beschluss der Kultusministerkonferenz vom 18.10.2012. Sekretariat der Ständigen Konferenz der Kultusminister der Länder in der Bundesrepublik Deutschland. [online] http: / / www.kmk.org/ bildung-schule/ qualitaetssicherung-in-schulen/ bildungsstandards/ dokumente.html [November 2018]. Kurtz, Jürgen (2019). Lehrwerkgestütztes Fremdsprachenlernen im digitalen Wandel. In: Eva Burwitz-Melzer, Claudia Riemer & Lars Schmelter (Eds.). Das Lehren und Lernen von Fremd- und Zweitsprachen im digitalen Wandel: Arbeitspapiere der 39. Frühjahrskonferenz zur Erforschung des Fremdsprachenunterrichts. Tübingen: Narr Francke Attempto. 114-125. Le Foll, Elen (2020). Issues in Compiling and Exploiting Textbook Corpora. Presented at the Japanese Association for English Corpus Studies 2020, Tokyo. https: / / doi.org/ 10.13140/ RG.2.2.32006.60487 Le Foll, Elen (2021a). Register Variation in School EFL Textbooks. Register Studies 3 (2): 207-246. https: / / doi.org/ 10.1075/ rs.20009.lef. Le Foll, Elen (Ed.). (2021b). Creating Corpus-Informed Materials for the English as a Foreign Language Classroom: A step-by-step guide for (trainee) teachers using online resources. Open Educational Resource. [online] https: / / pressbooks.pub/ elen lefoll [March 2024]. Schulenglisch: A multi-dimensional model 39 Le Foll, Elen (2021c). The multi-feature tagger of English (MFTE) in Perl v.3.0. [online] https: / / github.com/ elenlefoll/ MultiFeatureTaggerEnglish [March 2024]. Le Foll, Elen (2022a). “I’m putting some salt in my sandwich.” The use of the progressive in EFL textbook conversation. In: Susanne Flach & Martin Hilpert (Eds.). Broadening the Spectrum of Corpus Linguistics: New approaches to variability and change. Amsterdam: John Benjamins. 93-132. https: / / doi.org/ 10.1075/ scl.105.04lef Le Foll, Elen (2022b). Textbook English: A Corpus-Based Analysis of the Language of EFL textbooks used in Secondary Schools in France, Germany and Spain. PhD thesis, Osnabrück University. https: / / doi.org/ 10.48693/ 278 Le Foll, Elen (to appear). Textbook English: A Multi-Dimensional Approach. Amsterdam: John Benjamins. Limberg, Holger (2016a.) “Always remember to say please and thank you”: Teaching politeness with German EFL textbooks. In: Kathleen Bardovi-Harlig & César Félix-Brasdefer (Eds.). Pragmatics & Language Learning. Honolulu, HI: University of Hawai‘i, National Foreign Language Resource Center. 265-292. Limberg, Holger (2016b). Teaching how to apologize: EFL textbooks and pragmatic input. Language Teaching Research 20 (6): 700-718. https: / / doi.org/ 10.1177/ 1362168815590695 Loewen, Shawn & Talip Gonulal (2015). Exploratory factor analysis and principal components analysis. In: Luke Plonsky (Ed.). Advancing quantitative methods in second language research. New York: Routledge. 182-211. Love, Robbie, Claire Dembry, Andrew Hardie, Vaclav Brezina & Tony McEnery (2017). The spoken BNC2014. International Journal of Corpus Linguistics 22 (3): 319-344. https: / / doi.org/ 10.1075/ ijcl.22.3.02lov Martinez, Ron & Norbert Schmitt (2012). A Phrasal Expressions List. Applied Linguistics 33 (3): 299-320. https: / / doi.org/ 10.1093/ applin/ ams010 Mindt, Dieter (1987). Sprache, Grammatik, Unterrichtsgrammatik: futurischer Zeitbezug im Englischen. Frankfurt am Main: Diesterweg. Mindt, Dieter (1992). Zeitbezug Im Englischen: Eine didaktische Grammatik des englischen Futurs (Tübinger Beiträge Zur Linguistik). Tübingen: Gunter Narr Verlag. Mindt, Dieter (1995). An Empirical Grammar of the English Verb: Modal Verbs. Berlin: Cornelsen. Mindt, Dieter (2000). An Empirical Grammar of the English Verb System. Berlin: Cornelsen Verlag. Mindt, Dieter (2005). Schulenglisch mangelhaft: Wie lange noch endlich? . In: Thomas Herbst (Ed.). Linguistische Dimensionen des Fremdsprachenunterrichts. Königshausen & Neumann. 430-452. Möller, Verena (2020). From pedagogical input to learner output: Conditionals in EFL and CLIL teaching materials and learner language. Pedagogical Linguistics 1 (2): 95-124. https: / / doi.org/ 10.1075/ pl.00001.mol Neumann, Stella & Stephanie Evert (2021). A register variation perspective on varieties of English. In: Elena Seoane & Douglas Biber (Eds.). Corpus-based approaches to register variation. Amsterdam: Benjamins. 144-178. Römer, Ute (2004a). A corpus-driven approach to modal auxiliaries and their didactics. In: John McH. Sinclair (Ed.). How to Use Corpora in Language Teaching. Amsterdam: John Benjamins. 185-99. https: / / doi.org/ 10.1075/ scl.12.14rom Römer, Ute (2004b). Comparing real and ideal language learner input: The use of an EFL textbook corpus in corpus linguistics and language teaching. In: Guy Elen Le Foll 40 Aston, Silvia Bernardini & Dominic Stewart (Eds.). Studies in Corpus Linguistics Volume 17. Amsterdam: John Benjamins. 151-168. Römer, Ute (2005). Progressives, Patterns, Pedagogy: A Corpus-Driven Approach to English Progressive Forms, Functions, Contexts, and Didactics. Amsterdam: John Benjamins. Rühlemann, Christoph (2008). A register approach to teaching conversation: Farewell to Standard English? Applied Linguistics 29 (4): 672-93. https: / / doi.org/ 10.1093/ applin/ amn023 Schlüter, Norbert (2002). Present perfect: eine korpuslinguistische Analyse des englischen Perfekts mit Vermittlungsvorschlägen für den Sprachunterricht. Tübingen: Narr. Schönemann, Peter H. (1990). Facts, fictions, and common sense about factors and components. Multivariate Behavioral Research 25 (1): 47-51. https: / / doi.org/ 10.1207/ s15327906mbr2501_5 Siepmann, Dirk (2007). Wortschatz und Grammatik: Zusammenbringen, was zusammengehört. Beiträge zur Fremdsprachenvermittlung 46: 59-80. Siepmann, Dirk (2014). Zur Repräsentation von Mehrwortausdrücken in deutschen Lehrwerken des Englischen. In: Dirk Siepmann & Christoph Bürgel (Eds.). Sprachwissenschaft und Fremdsprachenunterricht: Spracherwerb und Sprachkompetenzen im Fokus. Baltmannsweiler: Schneider Verlag Hohengehren. Thornbury, Scott (2002). Training in instructional conversation. In: Hugh Trappes- Lomax & Gibson Ferguson (Eds.). Language in Language Teacher Education. Amsterdam: John Benjamins. 95-106. Velicer, Wayne F. & Douglas N. Jackson (1990a). Component analysis versus common factor analysis: Some issues in selecting an appropriate procedure. Multivariate Behavioral Research 25 (1): 1-28. https: / / doi.org/ 10.1207/ s15327906m br2501_1 Velicer, Wayne F. & Douglas N. Jackson (1990b). Component Analysis versus Common Factor Analysis: Some further observations. Multivariate Behavioral Research 25 (1): 97-114. https: / / doi.org/ 10.1207/ s15327906mbr2501_12. Winter, Tatjana & Elen Le Foll (2022). Testing the pedagogical norm: Comparing if-conditionals in EFL textbooks, learner writing and English outside the classroom. International Journal of Learner Corpus Research 8 (1): 31-66. https: / / doi. org/ 10.1075/ ijlcr.20021.win Zuur, Alain F., Elena N. Ieno, Neil Walker, Anatoly A. Saveliev & Graham M. Smith (2009). Mixed Effects Models and Extensions in Ecology with R. New York, NY: Springer New York. Elen Le Foll University of Cologne Online appendices Data, code and plots are available under a CC-BY-SA license from: https: / / osf.io/ n3fz9/ Appendix 1: Data preparation R notebook. Appendix 2: Table of the MFTE features entered in the analysis. Appendix 3: Analysis code and detailed results R notebook. Appendix 4: Three-dimensional version of Fig. 2.