Theory and practice of corpus-based semantics
1217
2013
978-3-8233-7754-2
978-3-8233-6754-3
Gunter Narr Verlag
Dr. Nikola Dobric
The book is a clear yet challenging overview of both the theoretical and the practical aspects of doing semantic research through the use of language corpora. Via a very hands-on approach it presents the relevant semantic and corpus linguistic issues in an accessible way and aims at providing a very practical experience of doing advanced corpus-based research.
<?page no="0"?> Buchreihe zu den Arbeiten aus Anglistik und Amerikanistik Nikola Dobrić Theory and practice of corpus-based semantics <?page no="1"?> Theory and practice of corpus-based semantics <?page no="2"?> Buchreihe zu den Arbeiten aus Anglistik und Amerikanistik Herausgegeben von Alwin Fill, Walter Grünzweig, Walter Hölbling, Allan James, Bernhard Kettemann, Andreas Mahler, Christian Mair, Annemarie Peltzer-Karpf, Werner Wolf Band 25 <?page no="3"?> Nikola Dobrić Theory and practice of corpus-based semantics <?page no="4"?> Bibliografische Information der Deutschen Nationalbibliothek Die Deutsche Nationalbibliothek verzeichnet diese Publikation in der Deutschen Nationalbibliografie; detaillierte bibliografische Daten sind im Internet über http: / / dnb.dnb.de abrufbar. Published with support of the Forschungsrat of Universität Klagenfurt with sponsoring provided by Privatstiftung Kärntner Sparkasse, Faculty of Humanities at Universität Klagenfurt, and Kärntner Universitätsbund. © 2014 · Narr Francke Attempto Verlag GmbH + Co. KG Dischingerweg 5 · D-72070 Tübingen Das Werk einschließlich aller seiner Teile ist urheberrechtlich geschützt. Jede Verwertung außerhalb der engen Grenzen des Urheberrechtsgesetzes ist ohne Zustimmung des Verlages unzulässig und strafbar. Das gilt insbesondere für Vervielfältigungen, Übersetzungen, Mikroverfilmungen und die Einspeicherung und Verarbeitung in elektronischen Systemen. Gedruckt auf säurefreiem und alterungsbeständigem Werkdruckpapier. Internet: http: / / www.narr.de E-Mail: info@narr.de Printed in Germany ISSN 0939-8481 ISBN 978-3-8233-6754-3 <?page no="5"?> TO MARINA AND FILIP <?page no="7"?> 7 Contents List of tables .................................................................................................................9 List of figures .............................................................................................................10 Acknowledgements ...................................................................................................11 1. Introduction ..................................................................................................13 2. The theory of corpus-based semantics .....................................................18 2.1 Early developments ............................................................................19 2.1.1 Historical-philological Semantics ................................................20 2.1.2 Structuralist Semantics ..................................................................21 2.1.2.1 Lexical field theory ............................................................22 2.1.3 Neostructuralist Semantics............................................................23 2.1.3.1 Generative Lexicon ...........................................................24 2.1.3.2 Meaning-Text theory ........................................................25 2.1.3.3 Distributional corpus analysis .........................................27 2.2 Fully usage-based approaches............................................................28 2.2.1 Construction Grammar and Frame Semantics ..........................29 2.2.2 Pattern Grammar ...........................................................................30 2.2.3 Emergent Grammar .......................................................................31 2.2.4 Situational Semantics ....................................................................32 2.3 The common theoretical ground .......................................................33 3. The practice of corpus-based semantics ..................................................36 3.1 Steps in corpus-based semantic research .........................................37 3.2 Polysemy ...............................................................................................39 3.2.1 The traditional view of polysemy ................................................39 3.2.1.1 Representation according to Neostructuralist Semantics .40 3.2.1.2 Representation according to Cognitive Semantics ...........41 3.2.1.3 Representation according to Lexical Pragmatics ..........44 3.2.2 A usage-based account of polysemy............................................45 3.3 Corpus-based semantic analysis - Steps 1 to 3.................................48 3.3.1 The corpus sample..........................................................................48 3.3.2 The verb look ....................................................................................50 3.3.3 The tagset ........................................................................................51 3.3.4 Setting up the analysis ...................................................................57 4. Corpus-based semantics and sense distinctness ....................................59 4.1 Sense distinctness ................................................................................59 4.1.1 Human sense disambiguation .....................................................59 4.1.2 Computer-based sense disambiguation .....................................60 <?page no="8"?> 8 4.1.2.1 Early machine translation approaches ............................60 4.1.2.2 Corpus-based methods .....................................................62 4.1.2.3 Problems of computer-based approaches ......................63 4.1.3 The fully corpus-based disambiguation .....................................64 4.2 Qualitative analysis .............................................................................65 4.3 Quantitative analysis ..........................................................................74 4.3.1 Distinctiveness and predictive power..........................................78 4.4 Evaluation of the results for sense distinctness ...............................85 4.5 The theoretical and practical implications .......................................91 5. A corpus-based account of prototypicality .............................................95 5.1 Prototypicality of senses .....................................................................96 5.2 Quantitative analysis ..........................................................................97 5.3 Evaluation of the results for prototypicality ..................................105 5.4 Theoretical and practical implications ...........................................115 6. A corpus-based account of sense networks ..........................................120 6.1 Quantitative analysis ........................................................................123 6.2 Evaluation of the lexical network construction .............................131 6.2.1 Beta-level senses ..........................................................................131 6.2.2 Gamma-level senses ....................................................................134 6.2.3 Delta-level senses .........................................................................136 6.2.4 Epsilon-level senses .....................................................................137 6.2.5 Evaluation through historical development .............................138 6.3 Theoretical and practical implications ...........................................140 7. Discussion....................................................................................................143 7.1 Overall theoretical and practical implications ..............................144 7.2 Implications for future research ......................................................147 Bibliography .....................................................................................................149 Sources ...............................................................................................................160 Appendices .......................................................................................................162 1 The list of all senses with the corresponding code used in the book .........162 2 List of the raw frequencies of occurrences of all of the predefined features per sense ..........................................................................163 3 List of the raw frequencies of occurrences of all of the non-predefined features per sense ..................................................................171 4 Sum of all token and type values of the attested contextual features per sense . 177 5 List of the contextual features with the values of their distinctiveness ......178 6 List of the predictive powers of all of the predefined features ....................181 7 List of the predictive powers of all of the non-predefined features. ...........192 <?page no="9"?> 9 List of tables Table 3.1 Senses of the verb look compiled from dictionaries.......................52 Table 3.2 The tagset used in the analysis of the verb look. ............................57 Table 4.1 Ranked raw frequencies and ranks of occurrences of the 35 attested senses of the verb look . ..................................................75 Table 4.2 Senses ranked by feature types ........................................................76 Table 4.3 Senses ranked by feature tokens ......................................................77 Table 4.4 The ten most widely occurring and ten least widely occurring predefined feature types ...................................................................78 Table 4.5 The ten most frequently occurring and ten least frequently occurring predefined feature tokens ...............................................79 Table 4.6 The possible interplay of distinctiveness and frequency in constituting predictive power values .........................................82 Table 4.7 Ordered prediction ratios constrained within 35 senses alongside the maximal possible prediction ratios for each sense .........84 Table 4.8 The number of identified types of features (out of the 12 possible attested ones) in the example sentence [43] ‘He tossed his horse's reins to a groom and went storming off looking for Dacourt.’ for each of the senses they co-occur with ...............................................87 Table 4.9 The list of ordered prediction ratios of the potential senses in the example sentence [43] ‘He tossed his horse's reins to a groom and went storming off looking for Dacourt.’.....................89 Table 5.1 Raw frequencies and ranks of the 35 attested senses..................100 Table 5.2 Ranked frequencies of the features per sense with feature ratios ..101 Table 5.3 Overlaps of the contextual features amongst all of the senses ..103 Table 5.4 Overlap instances and number of overlapped features - the overlap ratios .............................................................................104 Table 5.5 Ranking of senses according to their prototypicality - prototypicality ratios .......................................................................108 Table 5.6 Coarsely-grained hierarchy of senses based on the qualitative analysis ..............................................................................................110 Table 5.7 The order of etymological origins of the 35 identified senses of the verb look according to OED ....................................113 Table 5.8 Correspondences between the corpus-attested levels of prototypicality of senses of the verb look, the qualitative analysis of the category membeship, and their first recorded usages ... 114 Table 6.1 The hierarchical levels of senses (in code) and their etymological order of origin (years of first recorded uses marked by Y).......... 139 <?page no="10"?> 10 List of figures Figure 4.1 The first clustering of senses centering on ‘direct your gaze towards someone or something or in a specified direction’ .... 69 Figure 4.2 The second clustering of senses centering on ‘examine and consider’ .......................................................................................... 71 Figure 4.3 The third clustering of senses centering on ‘attempt to find’ .. 71 Figure 4.4 The fourth clustering of senses centering on ‘have the appearance of’ ............................................................... 72 Figure 4.5 The total non-criteria-based division of the senses of the verb look................................................................................ 73 Figure 5.1 Qualitative account of the links between the senses of the verb look.............................................................................. 107 Figure 6.1 Hierarchical tree-structure of a lexical network (adapted from Collins and Quilliam 1969). ...............................121 Figure 6.2 Levels of prototypicality of the 35 identified senses of the verb look based on their attested prototypicality ratios ...........124 Figure 6.3 The hierarchical semantic network based on the attested prototypicality and the highest overlap points of the contextual features........................................................................126 Figure 6.4 The amended hierarchical semantic network based on the attested prototypicality and the highest overlap points of the contextual features.............................................................128 Figure 6.5 The arbitrary network construction based on the highest overlap points of the contextual features ..................................130 <?page no="11"?> 11 A c knowledgements Reflecting back on everything that needed to be done in order for me to reach the point of completing this book it would be immodest of me to attribute the work to myself alone. As with most other scientific endeavors, there are many people standing behind the research presented within. The first ones who must be attributed for their contribution are the people who supervised my doctoral thesis which served as the basis of this book. I will be always indebted to my first supervisor Allan James for his constant encouragement, the great amount of freedom he bestowed upon me, as well as for the confidence he had in my work. I would also like to declare my sincere gratitude to my second supervisor Veronica Smith for her extremely consistent, timely, and constructive feedback and suggestions that successfully guided me to focus on specific problems arising during my work. Furthermore, sincere thanks also go to Zdravko Šolak, an academic who has given me sound advice all throughout my scientific career; Alexander Onysko for his suggestions and corrections; Günther Sigott for his most honest comments; Wolfgang Teubert for his valuable input; and to Randall Major for inspiring me to always strive for more and for his language contribution to the quality of the text. A special thanks goes also to the series editor Bernhard Kettemann for his faith in the publication, his advice, and his unique contribution to the final structure of the book. I would also like to thank the Alpen-Adria Universität Klagenfurt, more precisely the Forschungsrat of the University (Science Council) and the Sondermittel der Fakultät für Kulturwissenschaften (Special Funds of the Faculty of Humanities) for their financial and academic support which enabled me to run the CASIS project for more than two years and to gather all the required data in a timely fashion as well as, together with Kärntner Universitätsbund (Carinthian University Association), for their financing of this publication. This gratitude also extends to all of the people who participated on the project: the advanced students manually processing the corpus data - Alexandra Galler, Verena Novak, Pamela Prohaska, and Tjaša Žemlja; to the very competent programmer (and a dear friend) Thomas Hainscho for his contribution; and to the extremely knowledgeable statistician Herman Cesnik for his aid with statistical processing and work with SPSS. In addition to the academic support mentioned above, I also want to show gratitude to my family members. My parents’ constant support and the constant love and encouragement expressed by my wife and my son, which accompanied me on this thorny road, made the process much more enjoyable than it could have been otherwise. <?page no="13"?> 13 1. Introduction Coming from a background in corpus linguistics I recall being thrilled a few years ago while reading a paper entitled Corpus-based Methods and Cognitive Semantics: The Many Senses of To Run by Stefan Th. Gries (2006). The paper heralded a novel methodology of looking at polysemous lexemes termed Behavioral Profiling. It was to bridge the gap between corpus and cognitive linguistics and usher in a new era of semantic and lexicographic research based on quantitative corpus data. This apparently new methodology of, what is effectively, corpus-based semantics was to bring about the sorely missed solid criteria with which to firmly ground semantic meaning. It was intended to complement the more introspective qualitative (in this case cognitive semantic) accounts. Revolving around the famous stipulation "you shall know a word by the company it keeps” (Firth 1957: 11), the criteria that would ensure such a more empirical account of lexical meanings were to be the contextual features co-occurring meaningfully with different readings of lexemes. They have been imagined as outlining the morpho-syntactic and semantic context of a sentence or a clause (the microcontext) containing the given lexeme. Following the premise that a change in contextual distribution also signifies a difference in meaning (Cruse 1986: 1) a wide array of contextual features (termed ID tags in the paper after Atkins (1987: 24)) surrounding the polysemous lexeme run was attested and statistically processed by Gries (2006). The aim was to demonstrate how its distributional properties can be used to distinguish between its senses, attest its prototypical sense and the levels of prototypicality of other senses, and how they can help with structuring the lexeme’s lexical (semantic) network. All this was done with the emphasis on the solid empirical quantifiable corpus data coupled with a very high level of statistical processing. It need not be said how enthusiastic I was about the whole matter since I believe I suffered (and still probably do) from the same ailment every linguist out there suffers from - we all find it hard to accept that certain (if not all) areas of human language are limitless in their scope and lack solid uniformity, semantics being one of the most open and scientifically (in the sense of natural sciences and their standards) unaccountable ones. No amount of our insistence on linguistics being the same as, or even similar to, natural sciences such as mathematics or chemistry will bring the desired order, regularity, and sturdiness to our account of language. It seems that perhaps the best we can hope for is to investigate language and its use(s) as <?page no="14"?> 14 thoroughly as possible, discover all of the possible regularities, and note and admit the unaccountable vagaries. Be that as it may, at the time I was fascinated by this apparently fresh possibility of a fully criteria-based description of word senses, even if it was designed to complement rather than replace the more traditional qualitative (and almost fully subjective) methods of accounting for lexical meaning. Agreeing then fully with the possibilities quantitative corpus data and their statistical processing may present the lexicographic and semantic community with, I was determined to follow up on the said paper’s conclusion of the need to utilize the method further and to test it on a larger corpus sample (using more than the 815 citations presented in Gries (2006)). Needless to say, I was fully confident in the ultimate success of the procedure, particularly of the sense disambiguation part of it. So, being at the time at the onset of my doctoral studies I decided it would be a worthwhile topic to investigate. The research I had in mind (and which fully is presented in this book) was to be based, following the pattern set by Gries (2006), on one very polysemous lexeme in English. After some deliberation and comparison with similar instances of semantic analyses the choice was made - the verb look in its entire paradigm. The proof that it was a good choice came first from the fact that no other similar investigation of this lexeme was found during the preparatory stage. The second confirmation came from the fact that the verb look is indeed a very polysemous lexeme, as only a glance at some of the relevant dictionaries of English (such as for example the Oxford English Dictionary), yielded in excess of 40 different and (seemingly) distinct senses. With the focus of the research thus sorted out, I felt that the best way to proceede with the project was not to take all of the premises put forward by Gries (2006) for granted but rather to first perform a small pilot research project replicating the procedure described in the given paper. The small pilot research (Dobri 2010) was conducted on a 100 sentences random sample (taken from the Corpus of Contemporary American English (COCA)) containing the noun bachelor (which is quite famous for its recurrence in lexical semantic research). Five senses of the lexeme bachelor were extracted from the New Oxford Dictionary of English (NODE): ‘a male bird or mammal without a mate, especially one prevented from breeding by a dominant male’; ‘a young knight serving under another’s banner (archaic)’; ‘apersonwho holdsafirstdegree fromauniversityorotheracademic institutions’; ‘an apartment occupied or suitable for one person’; and ‘a man who is not and has never been married’. <?page no="15"?> 15 The contextual features chosen for attesting represented a commonly used cross-section of ontology usually found in similar linguistic analyses. Their choice was also largely conditioned by their ease of identification since the large sample to be subsequently processed for the verb look would require eliciting outside help. Their ease of identification was also to facilitate more inter-annotator agreement which would mean a higher degree of uniformity of annotation. The tags were then as follows: morphological features: plural, singular; possessive, of genitive, ellipsis; simple, compound; countable, uncountable; semantic characteristics of the lexeme: human, animal, concrete, machine, location, quantity; syntactic properties of the sentence/ clause it appears in: subject, object; transitive, intransitive; tense, voice; main, subordinate clause; morpho-semantic characteristics of co-occurring elements: subject, object of the clause being plural or singular; possessive, of genitive, ellipsis; simple, compound; countable, uncountable; human, animal, concrete, machine, location, quantity; and R1 (one place to the right of the lexeme) and L1 (one place to the left of the lexeme) collocates. The given pilot investigation involved creating a behavioral profile of the noun bachelor (again after Gries (2006)). This in essence meant collecting all of the relevant distributional information about the lexeme and then using the information of the correlation between the contextual features and senses in an attempt to define different readings of one lexeme. The creation of the given behavioral profile included several steps: after extracting the said 100 random sentences from COCA, each sentence (or a clause) containing the target word was analyzed manually for the listed contextual features; the features were counted and correlated with each sense they cooccured with; the collected features and their co-occurrences were then inspected in terms of being distinctive and sufficiently frequently occurring; and once the data was processed in such a fashion, the results of the now reversed procedure, which involved looking first at the distinctive features and seeing how they condition the readings back, were obtained. <?page no="16"?> 16 And the results were very promising indeed, if based on a very small sample of noisy data (Dobri 2010: 104): “If we consider an example of the ‘a man who is not and has never been married’ sense of bachelor, we can see how the model would work: 32% (54) of all the attested instances of bachelor are of this sense, which automatically gives the would-be software the 32% probability to hit the right sense out of any number of citations; 31% (52) of all the instances of the word of this sense are animate, concrete, and human, which raises the predictability level to 63%; and 30% (50) of all of examples of bachelor are simple, nonpossessive instances, bringing the software to a 93% of sense recognition accuracy.” The results practically indicated that by discovering only three contextual features which co-occur exclusively with the ‘a man who is not and has never been married’ (and thus co-occur with it strongly) we can predict the appearance of this sense at a 93% likelihood if given a choice of the five possible ones in any context attested for containing those three given distinctive features. After the initial thrill at the striking result, the pilot research had something else to add in order to curb that enthusiasm a little: “It is true that the procedure becomes more complicated with more polysemous words (exhibiting dozens of senses and sharing more features, making them less distinctive [...]“ (Dobri 2010: 104). The fact is that when dealing with only five senses, half of which are also either rare or archaic (such as the ‘an apartment occupied or suitable for one person’ one marked in NODE as Canadian or ‘a young knight serving under another’s banner’ marked as archaic) it is easy to get such a clear result. What would happen if the same was attempted with 40 or so senses of the verb look intended to be analyzed in more than 13,000 sentences/ clauses and with a somewhat more elaborate ontology. However, before I was to embark on thoroughly investigating the described issues and answering this question, I first had to find out what the (theoretical) reasoning would be behind this high level of correlation between the contextual patterns and particular readings of polysemous lexemes? This brings us to the current structure of the book indicated by its title: the first major part of it (Chapter 2) discusses the theoretical grounding of <?page no="17"?> 17 the distributional nature of lexical meaning and lists the prominent linguistic (semantic) paradigms that have played a role in arriving at the given account; the second major part of the book (Chapters 3 to 6) present the practical application of the methodology to the said example of the verb look; while the last part (Chapter 7) sums up the findings and discusses the results of the whole exercise. <?page no="18"?> 18 2. The theory of corpus-based semantics To start of our theoretical account with the very basics, I would like to draw your attention to the three examples below. We would surely agree that is safe to assume how any sufficiently proficient speaker of English would not have problems in understanding (disambiguating) the meanings of the lexeme look within them: [1] Marina’s look was scrutinizing him. [2] Marina’s look was very fashionable. [3] Marina had a look of utter amazement. However, if asked to provide clear and attestable criteria of why the noun look in [1] can be paraphrased as ‘the directed gaze’ while in [2] it reads ‘physical appearance’ and what, also in terms of criteria, makes them different, even the most experienced linguists would be very hard pressed to do so. Now, the question that is so naturally raised here is why would that be so difficult? If we were to look at some other scientific disciplines and their accounts of their principal topics of interests, we would see that it is not the case. It we take chemistry and what makes water and what are the criteria that define how it is different from vapor or ice as an example, we can instantly understand how odd the lack of such a simple formalization possibility is in linguistics (or rather semantics). To understand further how challenging it is to answer this question in semantics we must understand that this difficulty of not being able to provide decisive criteria for defining word senses and clearly discriminating between them has always been a burning issue of lexical semantics and a stumbling stone of many a linguistic theory to date. It is admittedly problematic to the point that it fundamentally questions the possibility of providing a clear and final account of word meanings. The problem becomes even more obvious when we look, for example, at the discrepancies between the paraphrases and the number of the senses of one and the same lexeme given in different dictionaries despite the fact that they usually reference each other (Teubert 2005: 11) and are based on similar corpus data. We can also witness the same problem if we try to use any commercially available machine translation (MT) software (Guerberof Arenas 2010: 4-5). However, as mentioned in the introductory chapter, relatively recently we have been witnesses to the resurrection of a statistical view of lexical meanings promising to supply us with exactly such solid criteria. It is to <?page no="19"?> 19 help us in describing the distinctions between different readings of one lexeme more precisely, more uniformly, and more transparently. Based on corpus data, this view of language in essence revolves around the contextual influence in conditioning word senses and it finds its footing in quantifying the nature and strength of that influence. The (corpus-based semantic) methodology has its roots in various lexical semantic theories going all the way back to the onset of systematic linguistic research. It has been more immediately realized through several approaches which can all be congregated under the term of exemplar-based models (Gries 2010: 337) or usagebased models. They have developed, as we indicated, within different semantic frameworks which include Construction Grammar (Goldberg 1995; Croft 2001; Lakoff 1987; Fillmore 1985; Kay and Fillmore 1999); Pattern Grammar (Hunston and Francis 1999); Emergent Grammar (Hopper 1987; Bybee 1998); and Situation Semantics (Barwise and Perry 1983). However, even though these exemplar-based/ usage-based models of language and lexical meaning appear relatively novel, the idea permeating themis somewhat less recent. This chapter will then take upon itself to chart the development of this idea, starting with its appearance and all the way to its contemporary form adopted ultimately by our research. 2.1 Early developments The view of lexical meaning propagated by corpus-based semantics has its origins in the very beginning of organized linguistic/ semantic research. It was voiced early on, then discarded, then brought back and revamped, discarded again and resurrected once more, and mostly presented as novel each time it reappeared (as it is, regrettably, so often the case in the humanities). It formally starts with Historical-philological Semantics, Hermann Paul (1920 [1880]), and a book entitled Prinzipien der Sprachgeschichte. The idea took on a slightly different form in the deliberations of Structural Semantics and Walter Porzig (1934), Leo Weisgerber (1927), and Jost Trier (1931[1971]). Neostructuralist Semantics followed suit with its own take on the idea given by, among others, James Pustejovky (1995) and Igor Mel’ uk (1996). <?page no="20"?> 20 2.1.1 Historical-philological Semantics Historical-philological Semantics marks the beginning of methodological semantic investigation and it can be traced to a period of between 1830 and 1930 and the (re-)emergence of interest in word meaning. As it is rather well known, linguistics (or rather philology) at the time 1 was primarily interested in the historical development of words and dealt almost entirely with diachronic semantic change. The first major catalyst of transformation bringing about the rise of interest in synchronic semantics was the tradition of teaching rhetoric. Among instructing writing, composition, and tone, stylistic figures held a prominent place in its curriculum. It was that component which was to prove vital for the development of semantics (Geeraerts 2010: 5). It developed a range of concepts referring to various figures of speech, some of them legitimate semantic phenomena such as euphemisms or metaphor and metonymy. The rhetoric tradition is especially important because it brokered recognition of the relevance of the given phenomena (most importantly metaphors) in every-day speech (Du Marsais 1730) rather than only seeing them in the context of oratory or literary works. The second piece of the puzzle is that the ultimate emergence of semantics came through the practice of lexicography. For their research base the new semanticists turned to dictionary makers. The appearance of the first monolingual dictionaries allowed for a multitude of ready-and-waiting examples to be drawn for research purposes (Considine 2008). By working with this empirical data base, which was of enormous methodological significance, Historical-philological Semantics sought to further reinforce the scientific systematicity of semantics (and linguistics) itself. New ideas of major importance appeared: one was the introduction of the psychological conception of meaning (Hecht 1888) and the other was the recognition of the role of context in semantics (Paul 1920 [1880]). Focusing on the recognition of the role of context, being the more relevant of the two for our topic at hand, the most important realization Hermann Paul made was in dividing meanings of a lexeme into ‘usual’ and ‘occasional’ (usuelle and okkasionelle Bedeutung) (1920 [1880]). The usual meaning is the one we would find listed first (as the most frequently used) in a dictionary. It should be seen as known and established in the community of speakers of the given language. In the words of Hermann Paul it is to denote “the total representational content that is associated with a word 1 Mostly represented by German and French scholars. <?page no="21"?> 21 for any member of the speech community” (Paul 1920 [1880]: 75; as cited by Geeraerts 2010: 15). The occasional meaning then represents any deviations from that primary (prototypical if you will) meaning. The deviations themselves are caused by the lexeme’s use. It is “the representational content that an interlocutor associates with a word when he uses it, and which he expects the hearer to associate with the word as well (Paul, 1920 [1880]: 75; as translated and cited by Geeraerts 2010: 15). Out of this kind of a division of word senses came the natural conclusion that context is the most important factor in understanding occasional meanings. The way context helps us, or rather makes us, select a particular reading out of many a polysemous lexeme may have Paul describes on the example of Blatt in German. According to him the noun (Blatt) is to be seen as having a completely different interpretation in the surroundings of a bookshop (when it would be a sheet of paper) as opposed to the surroundings of a park (in which case it would denote a leaf) (Paul 1920 [1880]: 77, as cited by Geeraerts 2010: 15). Whether it is a case of choosing one reading from a myriad of existing ones or selecting a concrete (or abstract) specification from a more general reading, it is the context that points out the appropriate one (Geeraerts 2010: 15). The same context works both ways - on one hand, it elicits more specific (or novel) readings (okassionele ones) from the more general ones (the usuelle meanings) but, through sufficient frequency of use, it also promotes the more specific readings into more general ones. In Paul’s (1920 [1880]) deliberations we can see the first formal realization of the usage-based theory of meaning. It seems to suggest that semantic meaning should also be seen from a pragmatic point of view and be sought after at the utterance level, in both theoretical and practical terms. We will see that his account will in essence permeate through all of the accounts to follow and will serve as the basis of the reasoning behind corpus-based semantic. 2.1.2 Structuralist Semantics The turn towards a different theoretical approach within semantics informally started with the previously mentioned important ideas emerging from the historical-philological paradigm, most importantly with the realization of the psychological nature of meaning. It formally started with an outright attack on this idea by Leo Weisgerber (1927: 170) whose objections were primarily aimed at the fact that a psychological view of language seemed to be excluding any point of view on it as a symbolic system. This <?page no="22"?> 22 constituted the first theoretical proclamation of Structuralist Semantics and was picked up by other authors as well (for example Trier (1931); Saussure (1916); Harris (1954); or Lipka (1972)). The very central structuralist conception of meaning was constructed as the idea that language must be perceived as a system - a symbolic system with properties and governing principles controlling the linguistic signs. 2 As the familiar argumentation goes, linguistics should disregard all other aspects of language as linguistically irrelevant and describe natural language as an independent symbolic system. The fact that we describe the linguistic sign as being a part of a system implies that we must primarily characterize the given sign in terms of its relations to other signs within the system (Geeraerts 2010: 47-53). It is this realization, coupled with one form of the contextual view of meaning (if not quite accredited to Hermann Paul), which bore significance to the development of the usage-based semantic paradigms within Structuralist Semantics. 2.1.2.1 Lexical field theory The basic outline of the lexical field idea can perhaps be explained by pursuing the following line of reasoning: if reality is seen as a space and is divided into conceptual plots, a lexical field would then be a set of semantically related lexical items whose meanings are mutually independent and which provide conceptual structure for a certain domain of reality (Geeraerts 2010: 53). The conjured-up image is that of a mosaic within which semantic knowledge is distributed into a number of small adjoining areas. The theory of lexical fields, related to this image, started with Leo Weisgerber (1927) and Jost Trier (1931). Their structuralist claim was that only a mutual demarcation of the words under consideration can provide a decisive answer regarding their value (Geeraerts 2010: 53). Again, the point is not to consider a word in isolation but only in relationship with other related words (Saussure 1916). The important question that arose was which relations to imagine as existing in these contextual plots? They chose paradigmatic relations as units of organization within lexical fields following the view that linguistic signs are a unity of form and meaning and that there is no formal relationship between the said form and meaning (Geerearts 2010: 55). 2 The essence is still best understood if one looks at Saussure’s well-known example of comparing language with the game of chess (1916). <?page no="23"?> 23 However, gradually it became clear that words may have specific combinatorial features which should indeed be included in the analysis of their meaning. This marked a shift from seeing such combinations as purely syntactic to being viewed as having a dimension of meaning within themselves and thus being of strong interest to semantics (Porzig 1934). The natural question that came up then was while the field conception introduced by both Weisgerber (1927) and Trier (1931) takes into account semantic relations of similarity, should it not also then encompass formal relations and consider co-occurrences between words as well? This question was actually related to the original Saussurean distinction between the paradigmatic and the syntagmatic axes of language (Saussure 1916: 175). Paradigmatic features, as we saw, denote associations of similarity of meaning and form while the syntagmatic features refer to the combinatory possibilities of constructing larger units such as compounds, derivations, and sentences (Geeraerts 2010: 57). Another reason why syntagmatic relations were seen as important to include came from the re-confirmation of Paul’s (1920 [1880]) idea that the environment in which a word occurred could and should be used to establish its reading (Geeraerts 2010: 59). It followed that any difference in meaning entails a difference in distribution (such as indicated by for example Firth (1957) or Harris (1954) later on). This means that syntagmatic differences are also expressions of different meanings. It comes as a given then, it seems, that if we can objectively chart the distributional differences among word senses and pose formal characteristics of their distribution, we can avoid the subjective interpretative methodology we normally find in semantic analysis. Despite these realizations, syntagmatic relations did not receive sufficient attention until the 1950’s/ 60’s and Neostructuralist Semantics, spurred on at the time by the rapid developments of information technology (IT) and its use in MT. 2.1.3 Neostructuralist Semantics 3 Even though it, on one hand, shows significant differences from the classical structuralist methodology (as for example in aknowledging the psychological nature of meaning), this theoretical framework, on the other hand, still represents in its essence the culmination of the structuralist semantic idea of the possibility of componentionalizing meaning (Geeraerts 3 The majority of the names of linguistic and semantic theories used in the book are based on Geerearts (2010). <?page no="24"?> 24 2010: 124-126). Two major strands can be recognized within this paradigm. One strand attempts at a reconciliation between the reductionist tendencies of componential analysis (represented within Structuralist Semantics by, for example, Pottier (1964/ 1965), Coseriu (1962), and Greimas (1966)) and the expansionist tendencies of the cognitive modeling stemming from Generative Semantics (and authors such as Katz and Fodor (1963)). 4 It states that the concepts we possess in our minds are clear-cut and simple. The fuzziness only appears when we apply them to the ‘real’ world which is not so clearly delimited. This notion implies that semantics represents a stable position and pragmatics, referring to the application of the conceptual level to the ‘real world’, as the flexible and unstable level. Well-defined overreaching semantic descriptions may then be formed and only modified at the pragmatic level under the influence of contextual factors. The other strand turns to further developing the possible array of lexical relations to be included in semantic descriptions (and it is linked to the tradition established by the relational methodology within Structuralist Semantics and authors such as for instance Lyons (1963)). These approaches are, to a smaller or larger degree, all tied to IT and computer linguistics. Both strands though can be seen as having a link to the idea of the contextual governing of meaning. The componential strand sees the context as modifying conceptual representations which can be recognized in the semantic realization of meaning. The relational strand sees the relationship between the contextual features and lexical meaning as the major point of semantic description. Two individual approaches illustrate these strands well. 2.1.3.1 Generative Lexicon Stemming out of the componential neostructuralist semantic approaches, this model of word meaning was devised originally by James Pustejovsky (1995; 1998; 2006). It is characterized by one focal issue - the view that meaning goes beyond a simple list of word senses and should incorporate the (obvious pragmatic) dynamics of language. The model envisages a number of procedures for forming semantic interpretations of lexemes in particular contexts. It first considers the systemic knowledge stored for each lexeme, seeing it as a pattern with different types of information in its configuration. The information ranges from the argument structures speci- 4 Note that it is different from cognitive semantics which does not attempt such reconciliation, but rather proposes a fully maximalist approach to semantics. <?page no="25"?> 25 fying the number and nature of the arguments to a predicate; event structures which define the type of the event; lexical inheritance structures providing taxonomic relations; and qualia structures which are sets of different descriptive characteristics accounting for words or phrases, such as for example the relation between a given object and its constituents or the perceived function or purpose of the object (Pustejovsky 1995: 85). Of all the reductive approaches the Generative Lexicon one is the most productive and the most sophisticated one and it has had a strong influence on all componential-based approaches. One of the possible problems this model may have is in drawing a principled distinction between semantic information and pragmatic or extralinguistic factors (Geeraerts 2010: 156). Another problem is the general issue of objective criteria identifying the elementary building-blocks of any componential analysis of meaning, including the Generative Lexicon. Nevertheless, it is underpinned by sound principles tying it to the corpus-based semantic view of language. The distributional properties of a sense of a polysemous lexeme are seen as one of the major dynamic factors describing that given sense. Contextual conditioning is inherent in each of the proposed argument structures as their pragmatic drive. It is the linguistic context which is once again perceived as the moving force behind the dynamic understanding of lexical meaning. 2.1.3.2 Meaning-Text theory The original semantic relations introduced by Lyons (1963) were in truth solely paradigmatic and represented a restricted set of linguistic associations. Igor Mel’ uk felt that the scope of paradigmatic relations could be further perfected. A complemented array of semantic relations is then the basis of the relational neostructuralist semantic Meaning-Text theory (Mel’ uk 1988; 1995; 1996; 1998) and its lexical functions. The new set of lexical functions extends to encompass semantic, grammatical, and morphological relations. The set of paradigmatic lexical functions included is much broader than Lyons (1963) posited, including, quite essentially, a whole set of syntagmatic functions as well. They are to be seen as playing a crucial role in the description of the lexical-syntactic patterns and in conditioning meaning. To take Dirk Geeraerts’ example, if a noun is denoting an action we may define a lexical function that determines a verb which takes the agent of the given action as its syntactic subject and the given word as its direct object - such a function would then associate for instance ask with <?page no="26"?> 26 question (2010: 162). Lexical functions devised in this way do not only specify pragmatic semantic relations, but also the syntagmatic restrictions between lexemes (Mel’ uk 1996: 99). 5 The Lexical Functions model is thus a fertile and versatile framework for the semantic description of the lexicon - the Explanatory Combinatorial Dictionary represents a richer source of information than WordNet for example (Geeraerts 2010: 163). Three issues arise regarding this methodology. The first one is concerned with the question of whether the set of lexical functions associated with a particular word suffices for the semantic description of that given word and all of its semantic content. The fact that the entries in the Explanatory Combinatorial Dictionary contain an analytic, standard dictionary definition next to a relational (lexical functions) description indicates that the lexical functions are not meant as a substitute for the more traditional form of semantic description, but rather an addition to it (Geeraerts 2010: 164). The second issue is centered on the ever recurring problem of the linguistic vs. encyclopedic distinction. For example, the theory does not include the meronymic distinction within its list of lexical relations because it is considered to belong to the extralinguistic description of the lexeme (Mel' uk 1996). Other scholars (for instance Fontenelle (1997) or Ramos et al. (1995)) have, on the other hand, argued on behalf of the practicality of including such relations (Geeraerts 2010: 165). The third point of criticism is aimed at the universality of some of the lexical functions, most notably the syntagmatic ones. If we were to apply literal translation to the pattern of one language, the restrictions of the lexical functions would be revealed. For example, literally translating from German into English might suggest that ‘to place a question’ (‘eine Frage stellen’) is the correct English equivalent, which it obviously is not (Geeraerts 2010: 165). The model nevertheless represents one more application of the contextual idea of meaning in practice and is therefore relevant to consider as one of the proponents and predecessors of the more outspoken usage-based approaches to language we will see later on. Some of the issues arising within it also have bearings on the ultimate evaluation of the overall idea as well. 5 So far the Meaning-Text Theory has been applied to Russian and French and it distinguishes more than 60 lexical functions. They occupy the central position in the major product of the approach, the Explanatory Combinatorial Dictionary. <?page no="27"?> 27 2.1.3.3 Distributional corpus analysis The underlying aim of this model is to focus on the syntagmatic environments in which a word occurs (as opposed to the previously seen dominance of paradigmatic relations). It provides much more insight into the nature of the distributional behavior of lexical items. Based on statistical methods, Distributional corpus analysis opened up an innovative and dynamic methodology through the employment of large machine-readable corpora (Geeraerts 2010: 166). There is a natural methodological link between syntactic theory and distribution-based lexicology, but the major advances within this model did not come from the syntagmatic-oriented paradigm (which was, to some extent, seen in some of the previously presented modes), but rather from the application of a distributional way of thinking about meaning to large corpora (Geerearts 2010: 168-169). The three major characteristics making this approach crucial for semantic research are: it is usage-based, bottom-up, and data-driven; it is statistically grounded; and it comes with a strong technological background (Dobri 2009a: 52-53). Statistical processing is of great importance for the model as it answers the question of how to deal with the problems of chance and the issues of handling large amounts of quantitative data. The statistical applications in thinking about contextual distributions brought about a link with another class of quantitative distributional models. They include natural language processing (NLP) word space models which consist of all (mostly practical) models which structure the meanings of a word in terms of context vectors (Geeraerts 2010: 174-176; Navigli 2009: 26). 6 Particularly relevant applications of the NLP models are word sense disambiguation (WSD), text mining, and the construction of knowledge bases. The application in WSD will be of particular importance later on when we put our proposition of a corpus-based semantic model into practice. The major point of limitation of the methodology is in the size and extent of the semantic information included in corpora as compared to the fullness of semantic information possessed by language users. Do they exhaust it or not and how big and well-constructed should a corpus be to be representative enough of human semantic knowledge? This is also something we will look into later. 6 Context vectors in NLP are vectors whose dimensions represent the contextual features typically co-occurring with a given word. <?page no="28"?> 28 2.2 Fully usage-based approaches As may have been made apparent by all of the approaches presented so far, the view of language that permeates them sees particular uses of a given lexical item as dependent on their place within a multidimensional network comprised of linguistic (paradigmatic and syntagmatic) contextual information. The central tension causing many of the listed insufficiencies of the given approaches was the one between minimalist and maximalist understanding of lexical semantics. It was the Cognitive Semantics paradigm that resolved this conflict and it did so by explicitly embracing a maximalist position. Cognitive Linguistics (or rather Cognitive Semantics) is theoretically grounded in several main tenets: a belief in the contextual flexibility of meaning, the conviction that meaning is primarily a cognitive notion, and the principle that meaning involves the embodiment of experience (Evans and Green 2006: 27). Of crucial importance for the establishment of these pillars was giving up the insistence on the distinction between semantic and encyclopedic knowledge. This truly represented a milestone in the development of lexical semantic theory as it implied that it is no longer necessary to draw the borderline between strictly definitional and descriptional features (Geerearts 2010: 183). An additional consequence of taking in the encyclopedic perspective was that encyclopedic information was now understood as not having the form of single concepts corresponding to single lexical items. Knowledge was actually seen as organized in broader categories and larger chunks of knowledge (Geeraerts 2010: 183). Even when fully committed to this cognitive view of meaning, a way for representing these larger chunks was needed, together with some manner of linking the relevant lexical items to that broad knowledge structure. One way was in perceiving language against the background of different types of contexts: psychology, language use, and a broader cultural and historical background. The fully maximalist position taken by Cognitive Semantics implied also completely abandoning the structuralist model of a purely linguistic level of language. By moving away from any form of (solid) linguistic representation, Cognitive Semantics has put itself in a methodologically unfavorable position. Positing theories based mostly on the (by default subjective and mostly non-empirical) perception of the workings of the mind is very perilous since that is one place where we cannot have any direct insight whatsoever (Teubert 2010). Stemming from such an important methodological realization, a model of lexical meaning which was <?page no="29"?> 29 to amend the said methodological issue had to link up both to the maximalist approach of Cognitive Semantics (for its more inclusive account of lexical meaning) and to the reductive approaches of Structuralism and Neostructuralism (for their firmer methodological grounding). The new model then had to bridge the observed theoretical and methodological gaps in both camps. 2.2.1 Construction Grammar and Frame Semantics One of the first incarnations of such a bridging model was the Construction Grammar approach. It was originally proposed by George Lakoff (1987) who imagined constructions as representing pairings of form and meaning, regardless of the linguistic expression embodying the form (lexeme, multi word expression, clause or sentence) and regardless of the type of the link between them. These pairings of form and meaning are such that some aspects of either the form or of the meaning are not predictable from the component parts of the construction (Goldberg 1995: 4). The pairings of meaning and form are also to be seen at all levels of description (including morphemes, idioms, etc.). Thus they represent a ‘what you see is what you get’ approach to syntax implying no surface vs. deep structure division within it (Goldberg 2006: 5). Since the constructions are learned as a section of the general cognitive input we receive, they are expected to vary crosslinguistically. On the other hand, there are scenes which can be understood as universal as well. What follows is that the meaning of a construction surrounding a particular lexeme is not motivated exclusively by the intrinsic lexical meaning(s) (and cognitive background) of that word but its use in combination with a particular structure of the construction it is a part of. This idea was further developed within the model’s direct descendant, Frame Semantics (Fillmore 1975; 1977; 1978; 1982; 1985 and Kay and Fillmore 1999). Frame Semantics sees meanings relativized to frames or scenes which in essence represent background general knowledge used in inferencing meanings. They are an idealization of coherent individual memory, experience, action or objects (Fillmore 1975: 212). As an example we can take the frame of a commercial transaction. Understanding frames to be prototypical descriptions of scenes, we can imagine that a commercial transaction must have a buyer and a seller, a place where it takes place, a certain time when it occurs, etc. Some parts of frames are more set and unalienable, such as the buyer or a seller, while others tend to be more open, such as the time <?page no="30"?> 30 and place of the commercial transaction. If we consider how the meanings of buy, sell, trade, exchange or barter can be interpreted within such a frame, we can understand how Frame Semantics (and ultimately Construction Grammar as well) accounts for lexical ambiguity and novel meanings. The more correspondence between the construction at hand and basic sentence types the more central and intrinsic will the encoded meanings be, both in their relevance to the basic human experience and in their frequency of use (Goldberg 1995: 50). If we were to imagine a universal frame of a commercial transaction as taking place in a shop with one person, the buyer, buying something (say groceries) and giving something in return (usually money) to another person, the seller, we can also imagine for the verbs buy and sell to have their central meanings of ‘to acquire the possession of’ and ‘to transfer (goods) to or render (services) for another in exchange for money’ respectively. In another frame where a person, an employee, is telling another person, his boss, a lie about why he was late for work, the sell in ‘He sold him the story.’ and buy in ‘The boss bough the lie.’ can be seen as meaning something quite different and definitely less central. 2.2.2 Pattern Grammar One more incarnation of the more linguistically grounded maximalist account of lexical meaning stems from John Sinclair’s (1991) observations that there is a clear association between meaning and syntactic patterns; the core of this paradigm is basically how certain patterns select particular readings of a given lexeme (Hunston and Francis 1999: 29). Sinclair is also responsible for recognizing the underlying principle of language being encoded through meaning and lexis rather than through grammatical choices which are actually a consequence not a cause (1991: 8). The process flows as follows: speakers of a language have in mind certain meanings they wish to encode; these meanings routinely attract certain constructions to lexicalize them, incorporating the required lexis and grammar; eventually a certain pattern of a word emerges and can be understood as all the words/ structures regularly associated with the given lexeme which contribute to its meaning (Hunston and Francis 1999: 30). Such patterns in turn occur relatively frequently with particular meanings of the given lexeme. Their co-occurrence is dependent on the word choice conditioned by the communicative situation. The patterns are also to be observed through their formal constituents rather than through their structural interpretation (Hunston and Francis 1999: 151). The road towards a semantic description <?page no="31"?> 31 then lies in the procedure of investigating the patterns regularly associated with any one lexeme. It involves first identifying the given patterns in a large sample of concordances. We must have in mind, however, the fact that the very same pattern can be associated with more than one lexeme. For example: [4] Marina looked at the book. [5] Marina looked for the book. [6] Marina looked after the van as it was leaving the driveway. [7] Marina looked after the van as it was left to her by her uncle. Looking at these complex lexico-grammatical forms and discerning which and what kind of readings can occur within them would perhaps help in discriminating the difference in meaning between look at in example [4] and look for in example [5]. Investigating then in such a manner the next two examples of [6] and [7] would however not yield much along the lines of drawing definitions from the given pattern. They are very similar, yet clearly associated to entirely different senses of the verb look. This issue of every pattern not being a ‘pattern’ (which means that the identification of meaningful patterns is almost impossible automatically) raises the question of how far we can take, in practical terms, this association of syntax and meaning. This is a question central to the research presented further on. 2.2.3 Emergent Grammar Picking up on the ongoing discussion of the problem of the arbitrariness of the match between syntax and function, the Emergent Grammar paradigm suggests a radically pragmatic model of meaning and is also in the tradition of the pragmatically revised cognitive semantic models. The revision starts by addressing the erroneous assumption that there are any independent decontextualized grammatical forms with functions housed within some abstract mental system of rules we somehow implement as we speak (Hopper 1987: 141). In line with other usage-based models, Emergent Grammar refuses to acknowledge that grammar is an object set apart from the speaker and the uses it may be employed for, that it preexists discourse, or that it is in any way dualistic and divided. The approach represents a way out from the form-to-function-to-form trap in seeing both syntax and meaning as real-time social (discourse) phenomena (Hopper 1987: 142). <?page no="32"?> 32 The structure and regularity do not come from some abstract deep level of grammar but rather are constructed (and reconstructed) by the discourse itself. They are shaped by language use and they then shape the language use back. Think for example about the trend we may see in some varieties of American English in which you can find prepositions dangling at the ends of sentence such as in ‘Where are you at? ’. The grammatical structure observed in such cases deviates from the standard contemporary norm but can in time reshape the norm itself and be considered as standard, given sufficient frequency of use within the discourse. The same can be even more easily applied to function and meaning where redefining senses happens extremely frequently. The grammatical form is then only to be seen, akin to lexis, as negotiable in communication and only reflecting speakers’ past experiences of these forms (or rather their experiential entrenchment in the discourse the speakers are a part of). Furthermore, this paradigm also sees the grammatical structure not as a set of abstract rules but rather a reflection of the systematical (and frequent) use of individual words in combination with each other. Form-meaning pairings are thus the basic units of language with certain pairings becoming conventionalized, and they are both instances of use and constructions stored in our memory (Bybee 1998: 251). The level of conventionality depends on the level of sedimentation of the structures formed by the token frequency of use (Bybee 1985: 117; Langacker 1987: 59). We can see that some very frequently occurring patterns become fully automatized and processed as single units (as for instance multi-word expressions). The manner of exploring these functionally conditioned forms and the links between them centers on investigating patterns that arise from these community-forged pairings of form and function, and is best undertaken by employing corpus concordances. The same words of caution, difficulties, and limitations mentioned previously with Construction Grammar, Frame Semantics, and Pattern Grammar apply as well. 2.2.4 Situational Semantics This exemplar-based paradigm centers on the external significance of language and the way we use language to communicate information by describing situations in the world. The situations here are understood as states of affairs and courses of events in which objects have properties and stand in relation to each other (Barwise and Perry 1983: 42). The image is one relatively similar to the previously described frames within Frame Seman- <?page no="33"?> 33 tics. The dependence and the conditioning of lexis of the formal grammatical structure is seen in the productivity of language - the same forms can be reused countless times and used anew in never before seen ways. The only thing that matters is the interpretation of these forms (or even only portions of them since not all parts of an utterance constitute parts of its interpretation as well). The interpretation is in turn completely reliant on the situation in which the form is used and which it describes making grammar completely dependent on the given (contextual) perspective - which is termed as the perspectival relativity of language (Lindström 1991: 744). It is this external significance of language that should be seen as crucial since it relates the only visible aspect of the mental significance of language - the communicative roles which forms may have and the attitudes language conveys (Barwise and Perry 1983: 40). The crux of the model is in emphasizing the contextual dependence in lexical semantics which is claimed to be impossible to explore without referencing to what are traditionally called pragmatic factors such as the communicative intentions of the speaker and the hearer and similar principles of communication (Searle 1969; Grice 1975). Aspect, tense, voice, and other traditional grammatical notions are to be understood as completely context-driven and hence more or less inessential when separated from the meaning and the intention of the message they are designed to convey. For example, my intention to communicate that, at a certain period of time, I ate a piece of chocolate cake and had a cup of tea would require the selection of a certain situation (similar again to a frame entailing the cake, the cutlery I used, the supposed table I sat at, a chair I sat on, the venue I bought the cake at, etc.) which would in turn require a selection of certain entrenched and agreed upon language forms (including particular morphemes, tenses, etc.) by which I could express this situation. The utterance ‘Two hours ago I had a lovely piece of chocolate cake and had some tea with it as well.’ takes this particular form because of the salient link between the situation I intend to express and the experientially conditioned norm and form I am to express it by, not the other way around. 2.3 The common theoretical ground There is one tenet we can propose as stemming from the overall discussion and that is the acknowledgement of the contextual and discoursereferential nature of language. Meanings are in essence constructed and <?page no="34"?> 34 reconstructed within the multimodality of language itself. Any imagined conceptual patterns residing in human minds cannot be seen as arising only from perceptual experience nor only interpreted referring to that kind of experience. The language-specific environment plays a singular role in conceptualization as well. Conceptualization is not only dependent upon pre-linguistic or non-linguistic experience but also upon the reinforcement by the discourse which surrounds us (Teubert 2010). All of the presented theoretical approaches in essence, whether willingly or not, discard any such dualistic nature of knowledge. They all share the view that the meaning and function of words can be extracted (up to varying points depending on the approach at hand) from the multidimensional distributional patterns constituting the context of the use of the target word. The lengthy discussion we presented so far and this short summary of the main idea stemming out of it has thus led us to a point where we can state the corpus-based semantic stand on the nature of lexical meaning. Being a fully usage-based/ exemplar-based model as well, it sees the semantic content of a lexeme as fully motivated and defined by its use. The choice of a form by which a given conceptual notion is expressed in language is fully usage-based; the level of entrenchment of one reading and of the strength of its link with a given form is also fully usage-motivated (revolving around the frequency of that usage and co-usage); the number of readings we can associate with a given form (its spectrum of polysemy) is also dependent on use; and ultimately the disambiguation of a lexeme is also completely dependent on the previous patterns of use we were exposed to. The outright statement then is that lexical meaning must be observed as a pragmatic phenomenon. Every imaginable pairing of form and meaning exists in potentia and is only brought into existence (in a more or less salient incarnation, pending on the frequency of their co-occurrence) by a particular instance of language use. Meanings are to be seen as entirely dependent on context only, differing in the strength of their association with given forms (creating, thus, constructions, situations or frames in an on-line process). This means that we can dispense with all of the traditional divisions into general vs. abstract (idiomatic) senses or stored vs. contextually dependent. All of the motivation behind such divisions stems from the observation that there are certain readings of a polysemous lexeme which can be seen as more linked to it as opposed to the senses that seem less connected to the given linguistic form. In corpus based semantics, this observation and the divisions coming out from it (as in all the related usage-based accounts of lexical meaning we have presented) are explained <?page no="35"?> 35 much more satisfyingly by contextually-conditioned frequency of cooccurrence of form and meaning within a given discourse. Hermann Paul laid it out perfectly in his original differentiation between usual and occasional meaning (1920 [1880]: 75). The very terms he employed mirror its complete usage-based nature. Our corpus-based semantic account argues further that most of the conceptual background of language also derives from language use, or more accurately, from the frequency of the need to refer and denote conceptual representations of the ‘real’ world imposed by communication itself. We will see, for instance, that notions such as prototypicality can be quite successfully explained by drawing on the frequency of use data alone. To sum up the theoretical view of corpus-based semantics, potential form and meaning(s) couplings only become visible when used and as such are not to be observed in isolation or in a static fashion. The strength of this coupling, quite dynamically as well, depends on the frequency of their occurrence together in discourse, which produces both more common and less common form-meaning combinations (usual and occasional meanings). Regardless of the strength of their combination, the content of such coupling is to be determined only within the context of use and it is a fully dynamic lexical phenomenon. A methodological question that is naturally posed here is how to investigate this dynamic semantic dance between form and meaning in practical terms and how to apply it in practice (such as for example in lexicography)? The answer to this question can perhaps be found by identifying salient complex lexico-grammatical forms and discerning which and what kind of words and meanings can occur within them, which of them are semantically compatible (not semantically identical) with the constructions’ intrinsic meaning, and if any pattern or any regularities in their distribution can be discovered (Gries 2010: 338). Such insight into the context would indicate meanings (or some aspects of meanings) of empty slots within a particular construction which would then be transferred and would condition any lexical item which would not be restricted in filling the given slot. The practical methodology based on this link between the syntax and the lexicon and its application in lexicography is precisely the one being explored further on. <?page no="36"?> 36 3. The practice of corpus-based semantics After investigating the theoretical grounding of this methodology in the previous chapter, we can have a closer look at its practical background and the manner of its concrete implementation. The theoretical cornerstone, stemming from the previous theoretical discussion, is the clear and evocative correspondence between the usage-based principles (seen as governing meaning and syntax alike) and the links between them. Starting with this assertion, corpus-based semantics in its concrete application adds a dimension of coherence to lexical semantic research, among other things by utilizing representative corpus samples and insisting on quantitative data and incorporating statistical processing. It attempts at dispensing with dependence on human-based semantic input as much as possible. Our methodology focuses on the attractive possibility of forging a more objective and criteria-based link between the linguistic context as a setting which invites a certain creation of form-meaning pairing and the semantic content being invited thus. The practical implementation of the theory we elaborated on in the previous chapter seemingly comes with a downside as well. We have, as stated several times so far, confirmed the commitment of corpusbased semantics to observe the context of use as the pragmatic place of motivation for semantic content being linked to lexical items. The practical downside we implied refers to the fact that the reading of each of the lexical items in the context of one sentence, for example, is dependent on the reading of each other lexical item in that given context and of their combined meaning as well. In practical terms it is not possible to replicate the semantic inference necessary to disambiguate a lexeme in such a human manner. Rather, our model centers on the usage-based reasons we can recognize behind the described complex human process of inferencing - frequency of use and co-use of the forms themselves within the linguistic context only. One more fact, stated also several times before, is that the strength of the links between forms and the meanings they can link to themselves is dependent on their co-occurring together and co-occurring with other forms in a wider context. Applied corpus-based semantics focuses on these two notions - the co-occurrences of the forms with meanings and other forms and the frequency at which individual cases occur. It looks at the ways the various morpho-syntactic (and some semantic) properties of the linguistic microcontext condition the activation of form-meaning links without disambiguating all of the lexical content. <?page no="37"?> 37 Development-wise, corpus-based research methodology is in fact related to various practical research models geared towards individual semantic tasks and particular types of semantic analyses (Baroni and Lenci 2010: 3). Some of the related practical applications include (Gries and Otani 2010: 121-122): Co-occurrence Approach (Rubenstein and Goodeough 1965; Firth 1957; Bolinger 1968; Hanks 1996); Substitutability Approach (Deese 1962; Charles and Miller 1989); Microfeatures Approach (Waltz and Pollack 1985); Similarity-based Methods (Dagan, Marcus and Markovitsch 1993; Grishman and Sterling 1993); and the mentioned Generative Lexicon (Pustejovky 1995). They are perhaps best generally represented by a relatively recently emerging model designed to erase the problem of ‘one task one model’ - Behavioral Profiling (BP) (Gries (2006); Divjak and Gries (2009); Stefanowitsch and Gries (2003)). It is this contemporary model which will be used as exemplary further on as an amalgamation of the various existing models mentioned above. Of all the said models it also stands as the most linguistically and technologically transparent practical application of corpus-based research methodology. Therefore, it is a perfect candidate to demonstrate the application of the theoretical reasoning behind corpusbased semantics. 3.1 Steps in corpus-based semantic research The implementation of any corpus-based semantic analysis (exemplified here through the workings of the Behavioral Profiling model) generally involves several important and time-consuming steps (Dobri 2010: 98-99): STEP 1: first it is necessary to compile a representative corpus sample of the target word(s) one wishes to investigate. It is important to bear in mind here both the representativeness of the corpus used (Dobri 2009a: 52-53) as well as the statistical relevance of the number of sample sentences and the manner of their (random) selection (Dobri 2009b: 104); STEP 2: the second step is to compile a set of morpho-syntactic and semantic markers (a tagset) which will be sufficiently representative of the context within which the distributional characteristics 7 of the target word are to be attested (including here sense-tagging 7 Termed also ID tags (Atkins 1987), weighted word-link word tuples (Baroni and Lenci 2010: 4), and context vectors (Navigli 2009: 26). <?page no="38"?> 38 the target word itself). It is important to carefully decide which individual building-blocks of the micro-context (sentence or clause) are to be marked, guided by both the representativeness of the potential ontology used to define the tagset as well as the perceived ease of its subsequent identification; STEP 3: the third step is the most daunting and extremely timeconsuming one. It involves manually (or semi-automatically at best) annotating the corpus compiled in step one for all of the tags decided on in step two. If one imagines that an average of about 60 to 70 different tags are assigned to every microcontext, the amount of work to be done for any statistically relevant-size corpus sample is clearly substantial; STEP 4: the fourth step can be seen as an extremely elaborate counting and correlating procedure. The goal is to produce an outline of all of the senses of the target lexeme with their correlations to the wide variety of the tags they were attested as co-occurring with; 8 and STEP 5: the fifth and final step is interpreting and evaluating the results of the processing. Ideally, relevant results about the distributional properties of the readings of the target polysemous lexeme should be obtained. To test this exemplary model of a corpus-based analysis, based most immediately on Behavioral Profiling, and in essence evaluate the overall applicability of corpus-based semantics, we will embark on a very practical research exercise. The research will involve us going through all of the listed steps as well as an analysis entails and discussing the results obtained by their application to the ever-burning issue of polysemy. 8 This time-consuming procedure can (reportedly) be replaced or supplemented by using very elaborate statistical procedures, most notable being hierarchical cluster analysis. For more on the procedure and the measurements used, see for example Gries (2006). The research we present further on performs the whole procedure manually in an effort to break it down into its constituent parts and thus better understand both the process and the results obtained. <?page no="39"?> 39 3.2 Polysemy Before we can start the discussion of the usefulness of the methodology as applied to this all-inclusive semantic phenomenon perhaps a brief reminder of the open issues regarding polysemy would be necessary at this point. So to kick it off very traditionally again, it is a well-known fact that the systematic linguistic research of the plurality of meaning started tentatively in the mid-19 th century with Historical-philological Semantics and Michel Bréal who actually coined the term (1897: 145). However, as we will see in the first part of our discussion, traditionally the notion received little attention within the structuralist and, for the most part, generativist theories of language. It only reappeared with sufficient vigor within Neostructuralism and, more importantly, within Cognitive Semantics and Lexical Pragmatics. 3.2.1 The traditional view of polysemy As mentioned above, the basis of the discussion that was to continue all the way to contemporary deliberations on polysemy was in essence first worded in philosophy by John Locke and Gottfried Leibniz in the 17 th century. Their debate touched upon the issue of whether plurality of meaning (in their case of the linking word but) was to be observed as a phenomenon that incorporated various individual and separate meanings connected to one word (as Locke (1975 [1689]) proposed) or should it be seen only as a case of one very abstract and general meaning which is instantiated through specific communicative situations as various more specific but closely related extensions (as claimed by Liebnitz (1996 [1765]). Translated into more contemporary terms, the issue was of the sense enumeration lexicon approach to polysemy (which presupposes that a polysemous lexeme consists of different listed and stored meanings the way words are usually presented in dictionaries) as opposed to the idea of the core meaning paradigm (which sees polysemous lexemes as being defined by one maximally general meaning which is constant and present in all of its extensions) (Geeraerts 1993: 223-224). Linguistics took over the consideration of polysemy from philosophy, the historical-philological approach being the earliest linguistic theory to do so. Almost entirely ignoring the philosophical discussion preceding it, the paradigm saw polysemy in quite different terms. Polysemy was understood as a form of lexical ambiguity resulting from the process of diachron- <?page no="40"?> 40 ic semantic change by which new meanings have been established in language, while the older, more redundant ones, are still present in the lexicon. For Historical-philological Semantics polysemy was a non-issue since discourse was seen as solving all of the problems of possible ambiguity. Structuralist Semantics disregarded any and all previous deliberations on the nature of polysemy and saw it simply in terms of its opposition to homonymy and in terms of the semantic relations between different meanings within it. Such a view of polysemy was prevalent all the way through several decades of Structuralist Semantics’ dominance where it was not given, for the most part, sufficient attention. 9 The structuralist take on polysemy was actually only significant because its considerations reiterated and raised significant questions regarding the nature of polysemy rather than answering them. The actual answering of said questions required a number of approaches to be developed. Two major and one minor methodology arose with sufficient muster to meet the challenge of describing lexical ambiguity - Neostructuralist Semantics (stemming in essence from Structuralist and Generativist Semantics), Cognitive Semantics, and, especially relevant, Lexical Pragmatics (Nunber 1979; Blutner 1998; Falkum 2011). 3.2.1.1 Representation according to Neostructuralist Semantics Following the componential tenets inherited from Structuralist Semantics, this paradigm asserts that the complex structure of formal representations must be invoked in order to capture the intricacies of the relations between the senses within a polysemous lexical item (Geerearts 2010: 124-126). If we look at this general consideration first adopting the core meaning theory, we are to understand the model as one including a single general meaning from which all other readings are seen as contextually derived (Ruhl 1989: 21). Such a monosemous view is not very popular because, if represented linguistically, it often fails to provide minimally sufficient definitions (for the core meanings) which can cover all of the remote senses of certain lexemes. That is why the more common interpretation within all decompositional approaches is to see polysemy through the sense enumeration lexicon prism. This entails imagining each meaning of a polysemous 9 Several noted analyses of polysemy were performed within the structuralist paradigm (including Apresjan (1973); Ullmann (1975); and Lyons (1977)), most of which actually did not yield any new insights, but are still interesting from the point of view of their empirical research design. <?page no="41"?> 41 word as separately stored and associated with its own set of necessary and sufficient conditions. Building on such a decision and following the discussion of polysemy and issues such as its cognitive nature and the novel ways of its formal representation opened up by Katz and Fodor (1963) within Generativist Semantics, several formalist approaches developed within the neostructuralist paradigm: Anna Wierzbicka’s (1985) Natural Semantic Metalanguage, Ray Jackendoff’s (1996) Conceptual Semantics, and Manfred Bierwisch’s (1987) Two-Level Semantics. Several objections can be posed regarding the neostructuralist account of polysemy inherent in each of the individual approaches. The first objection is regarding its general view of polysemy which understands meanings as listed in an enumerative fashion (which all of the major formalist models do). It, in fact, is not a practically feasible option. Firstly, it puts too much strain on the mental lexicon since meaning extensions can be seen as contextually infinite. Secondly, it reduces polysemy to pure accidental semantic arbitrariness. Thirdly, it completely disregards the dynamic nature of lexical meaning. Among the problems the decompositional approaches have is also the lack of any possibility to properly define lexemes and their semantic content using any kind of proposed primitive semantic (necessary and sufficient) building-blocks. Another problem is also the inability of primitive-based definitions to encompass the vagueness and the typicality effects of lexical concepts. More flexible accounts of polysemy needed to emerge - Cognitive Semantics and Lexical Pragmatics. 3.2.1.2 Representation according to Cognitive Semantics As already stated in the previous chapter, Cognitive Semantics is a theoretical approach that tries to make peace between various extreme linguistic theories and semantic tenets such as encyclopedic vs. linguistic knowledge, conceptual vs. semantic content, etc. It does so in the case of polysemy as well. Cognitive Semantics takes the middle ground in the opposition between the core meaning and the stored meanings view of polysemy. In its later development Cognitive Semantics sees some senses as stored while others are understood as produced by the context (see for example Tyler and Evans (2003)). 10 It makes such an assumption by proposing a radically different view of word meaning based in essence on typicality effects (Ev- 10 Though the early model did actually represent a cognitive-based version of the sense enumeration lexicon (see for instance Lakoff (1987)). <?page no="42"?> 42 ans and Green 2006: 342-352). Prototype theory in Cognitive Linguistics sees categories of knowledge organized around a prototypical representative of the given category with members closer or further away from that center. The border of the given category is seen as fuzzy and in fact overlapping with other categories. This conceptual model can quite successfully be superimposed onto lexical concepts. The early application of prototypicality to polysemy suggested that meanings of a polysemous lexical item are organized around a semantic prototype with meanings closer to it being more core and prototypical than the peripheral ones (Fillmore 1982: 125- 126). A hurdle that this early model could not overcome was that it was unable to account for complex concepts. For example, a budgerigar is a prototypical PET BIRD, but not a prototypical BIRD (being perhaps a sparrow) nor a prototypical PET (a cat sufficing in this case) (Fodor 1998: 174). Another big theoretical insufficiency was the fact that many concepts lack a clearly attestable prototype, such as KNOWLEDGE or BELIEF. Finally, people understand a given lexical concept and its polysemous extensions without having to rely on prototypicality information. For instance, people can access the various meanings the abstract noun knowledge without having a clear idea of what a prototype of the category KNOWLEDGE would be. The first alteration to the early prototype model came from Lakoff (1987). He asserted that the real application of family resemblance effects and prototypicality should be sought in the existence of more comprehensive and more complex patterns of general knowledge found in Idealized Cognitive Models (ICMs). An ICM is a relatively well-entrenched system of experiences and knowledge that arises from them having to refer to a particular concept with sufficient frequency. There is no hierarchical gradation within the ICM and the entrenched knowledge serves to highlight denotates in terms of them being more or less typical instances of the category. Additional factors producing typicality effects are cluster concepts (Lakoff 1987: 74) which are formed out of various different ICMs and the features that represent them. The presence and absence of the given features characterizes one denotate as being closer to or further from the totality of all ICMs in the cluster. This model of knowledge representation actually gave rise to the predominant contemporary Cognitive Semantics view of polysemy (though analyzed and presented in a myriad of variations and improvements) - that of a radial category. 11 11 For a more detailed account see Brugman (1988); or Brugman and Lakoff (1988). <?page no="43"?> 43 In short, radial categories involve a central cluster concept which is combined with a number of extensions all representing certain variants of the given central ICM cluster. In this way lexical content is seen as mirroring the conceptual organization of the mind and all word meanings are then organized according to a prototypical central sense. The idea is that all of these different senses can be seen as stemming from one prototypical sense exhibiting different degrees of typicality. Put all together they form a meaning chain of attribute-related senses (Brugman and Lakoff 1988). Less prototypical senses are derived from more prototypical ones through various mechanisms of meaning extensions (such as metaphorization and metonymization), leaving some marginally connected senses to have quite different and at first glance unrelated meanings. The problem with this methodology was that it assumed, in line with the early Cognitive Semantics’ ties to formalist and computer-based linguistics, that all of the meanings are stored in the mental lexicon. This has been criticized as unfeasible because it would again entail an endless storage capacity of different senses in order to cover the full range of possible lexical concepts, which can be assumed as virtually unlimited (Geeraerts 1993: 234-236). This full specification approach to polysemy also fails when it comes to the absence of any criteria that can be used to discern one meaning from another, apart from subjective intuition. Having in mind all the given inconsistencies, there are several paths of improvements the original ICM-based radial network can take. The first thing to be modified in the ICM and radial category approach was to soften its perspective on the number and the extent of senses understood as stored in the mental lexicon. Such a softened approach, as for example the Principled Polysemy model (Tyler and Evans 2001), involves seeing two types of senses in order to avoid problems of the polysemy fallacy. There are the senses entrenched and stored in the lexicon and there are those pragmatically derived by context. This difference is seen as occurring between two opposing poles - ambiguity (including homonymy and polysemy) which represents the stored senses and vagueness which stands for contextually specified meanings (Geeraerts 1993: 258-260). In essence, the major contribution of Cognitive Semantics is in defining some meanings as more central and stored in the mental lexicon, while some as less salient and only contextual extensions of the more entrenched senses. The central (and presumably stored) meanings are the ones perceived as prototypical with their depth of storage decreasing with their levels of prototypicality, the very peripheral members being vague rather <?page no="44"?> 44 than ambiguous. Such a way of modeling polysemy, even though not fully in congruence with the corpus-based semantic view of the lexicon, nonetheless suggests quite plainly the focal points in need of analysis: defining and delimiting each of the readings involved; identifying the prototypical reading and the levels of prototypicality (and sedimentation) of the other readings; and looking into the relationship and the possible links and paths of semantic extension between readings themselves. All of these points of analysis revolve around the linguistic context of use. 3.2.1.3 Representation according to Lexical Pragmatics The other end of the extreme, a complete opposite to the sense enumeration lexicon, is to see all meanings as just a problem of reference, clearly deducible from the pragmatic context alone (Nunberg 1979: 156). The highlighting of senses is seen as working both through linguistic deixis and through a process of deferred reference (Falkum 2011: 205). This entails referring indirectly to another entity in close relationship with the intended referent which is then successfully identified by employing common knowledge, following from the notion of cooperative principles of communication (Pethö 1999: 24). This would account for novel uses and eliminate the need for the supposed endless storage capacity for readings necessary in our minds. Other pragmatically-based theories build further on such a representational image, one improvement achieved by adding the theory of conversational implicatures (Blutner 1998) to the just stated original pragmatic-based conception. Another addition, stemming from Relevance Theory (Sperber and Wilson 1995), is in accounting for polysemy and contextual influence by referring to the underspecification of intended concepts in the produced lexical item which needs to be resolved through pragmatic (contextual) inferencing. The modified model, relatively close to the less radical cognitive approaches, is theoretically the single closest to the preferred usage-based view of meaning given previously and favored by our methodology as well. Its view of polysemy, in combination with some of the aspects of it stemming from Cognitive Semantics (most importantly the category membership in semantics), will indeed be the basis of the stand taken by our research further on. <?page no="45"?> 45 3.2.2 A usage-based account of polysemy The discussion of the traditional views of polysemy has made quite clear that the formalist approaches have trouble explaining the various vagaries of lexical meaning through their componential methodology. They fail at the definitional level, not being able to provide fully encompassing definitions of senses. They also, for the most part, fail to account for typicality effects. They fail at the criteria level, the criteria not being clearly presented in the definitional practice these approaches stand for. They fail in their sense enumeration approach to polysemy as well. Finally they also fail to contribute sufficiently to clearly defining between vagueness and ambiguity. A more appropriate account of polysemy, as well as of meaning itself, is then perhaps to be found within the Lexical Pragmatics and Cognitive Semantics (modified) approaches. The most basic Cognitive Semantic outline of polysemy is that of a radial category representation based on ICMs and family resemblance typicality effects. Mirroring conceptual organization, lexical meanings were seen as organized around a prototypical central meaning, all other senses deriving from it by various mechanisms of meaning extension. This early ‘full specification’ approach was criticized because it imagined all senses as stored in the mental lexicon which created too finely grained sense distinctions (Sandra and Rice 1995: 90). The early model was softened up by an understanding that while some meanings need to remain seen as stored and entrenched in order to account for the systematicity of polysemy (evident in conventional polysemy and in diachronic semantic change for example). The role of the context also had to be acknowledged more by allowing some meanings to be purely contextual extensions (Tyler and Evans 2003: 38). Such a softened approach has several variations though. One such variation, bordering on radical pragmatics and quite close to Lexical Pragmatics, can be found in Dirk Geeraerts’ (1993: 288-262) conclusion on the inconsistencies of polysemy tests. He notes (following Bosch’s (1979) reasoning, which suggests reinforcing the distinction between vagueness and ambiguity by first identifying a bridging context bringing together the given meanings) that the distinction between polysemy and vagueness cannot be clearly maintained due to the extraordinary contextual flexibility of lexical meaning. Resembling again Lexical Pragmatic theory, meaning should rather be seen as a process - the process of meaning construction through context. The prototypicality view maintained by Cognitive Semantics is preserved though, evoking the Vantage Point theory of semantics proposed by MacLaury (1991), where the context highlights the meaning <?page no="46"?> 46 necessary for the given domain rather than having it fixed in advance (Geeraerts 1993: 260). This kind of an approach thus challenges the very possibility of objective lexical semantics since there seems to be no space left for objective meanings but rather only for contextual interpretations. This is a very significant point to be picked up later on. A similar version of the cognitive/ pragmatic approach to polysemy, only outlined in more detail, can be seen in David Tuggy’s (1993) approach. The idea, stemming from Langacker (1987), is that cognitive schemas exist in the mental lexicon to the extent to which they become entrenched through repeated usage. The better entrenched they are the more salient they become, occurring more forcefully and being activated more easily by the context (Tuggy 1993: 274). If a schema exists, but is not salient enough, the lexical item is then polysemous and between full ambiguity (homonymy) and vagueness. The influence of context in meaning selection comes as a given within this cognitive model, as it is one of the major factors enhancing entrenchment, frequency determining the strength of subsequent contextual activation (Tuggy 1993: 285-286). This model basically follows Geeraerts’ (1993) understanding of the unclear border between polysemy and vagueness (seen also in Cruse (1986); Sinclair (1991); Sandra and Rice, (1995); and Sandra (1996)), but is closer to the general cognitive semantic view of meanings both stored and contextually specified. One more relevant and theoretically similar semantic interpretation is seen in George Dunbar’s (2001) critique of Geeraerts’ (1993) and Tuggy’s (1993) elaborations on the nature of polysemy and meaning. It sees lexicalized concepts as using encyclopedic knowledge in the creation of conceptual representation outlined by the given discourse situation (Dunbar 2001: 11). Parameters within lexical concepts are sometimes specified by the concepts and are sometimes vague, the choice of a reading being influenced by background knowledge. Some of these concepts are stored separately by learning, accounting for the attested different numbers and types of concepts elicited from different people and the different subjective levels of awareness of the relatedness between concepts (Langacker 1987: 17). Since this conceptual history of acquisition and origins of concepts is not available consciously, people only engage in metacognitive reflection on meaning with the context as a knowledge background (Dunbar 2001: 12). Paul Deane’s (1987; 1988) approach departs from the cognitive model towards an even more pragmatically-orientated theory, but still retains some aspects of the cognitive semantic view on polysemy (such as the prototype theory). The starting point for this theory is also extralinguistic knowledge, the point of departure <?page no="47"?> 47 being that the approach does not recognize any prototype model within polysemy. Rather, it sees polysemy as confusion between various aspects of meaning abstraction and a lack of certain semantic features stemming from encyclopedic knowledge (Deane 1988: 345). Another later addition to this Referential Theory is the emphasis on the influence of context, stemming from Relevance Theory (Sperber and Wilson 1995) and notions of pragmatic attention (Deane 1987: 148). To summarize all of the outlined cognitively/ pragmatically motivated models, they are (for the most part) centered on the view that a variety of meanings seen in polysemy (and vagueness) are, in essence, products of linguistic context and disambiguated through pragmatic reference. Each of the more applicable theories of polysemy we saw just above accept fully the crucial role context plays in meaning production/ disambiguation. They, in essence, present polysemy as a pragmatic rather than as a lexical notion making thus, in fact, a full circle back towards the similar deliberations on its nature made a century ago within Historical-philological Semantics. The manner in which meanings are produced, in which we can disambiguate them, the way they interact, the manner of their storage and conceptualization in our minds, all seem to be products of contextual situations of use. Only after attesting this context of use can we account for the polysemy and vagueness of given lexemes in a way that the context at hand allows it. This contextual salience seems to be deriving from the three related pillars of polysemy stemming from the cognitive/ pragmatic accounts seen above, all of them internally motivated by the context and frequency of use. The first pillar is seen in the fact how linguistic context (syntactic and semantic) helps in distinguishing clearly between senses. The limitation of the corpus-based semantics applied is that it is only applied to the linguistic context because observing any other possible level of contextual use (such as the extralinguistic ones entailing background encyclopedic knowledge) is still not, in any practical way, really within the grasp of quantitative linguistic modes of analysis. The second pillar is the family category organization of senses identified by looking at their prototypicality and how it is conditioned by contextual features. Regardless of whether we view senses as stored, contextually driven or both, one thing that most theories of polysemy agree on is that some senses seem to be more salient than others. They come to mind sooner when invited by a certain lexical form, appear more often in communication, and seem to be acquired faster than others. Whether the motivation for such a state of affairs is cognitive, sociolinguistic or statistical (as we suspect), the fact is that <?page no="48"?> 48 it is easily attestable and apparently an integral part of lexical meaning and polysemy. The third and final pillar has to do with how the contextual features facilitate the semantic links between the distinct senses within a given lexical (semantic) network. Important insight into how meanings interact within one polysemy spectrum and how this interaction derives from contextual factors can be gained by investigating the linguistic context first. The following analysis, based broadly on Gries (2006) and the Behavioral Profiling methodology (as representing one of the latest and most exemplary incarnations of corpus-based semantic methodology), will be founded on these three pillars. It will investigate sense distinctness and its relation to the linguistic context, prototypicality of senses within contextual restrains, and the lexical network structure as defined by the linguistic patterns of language use, looking within the polysemy of the verb look. 3.3 Corpus-based semantic analysis - Steps 1 to 3 As we have indicated, for the purpose of applying this methodology to the most persistent semantic issue of polysemy one polysemous lexical item was chosen - the verb look. In order to conduct our corpus-based semantic analysis of the given lexeme we have to refer to the previously listed steps of any such analysis, exemplified here through emulating the described Behavioral Profiling procedure. If we recall, the first three steps of any such analysis involve, in practical terms, compiling a representative corpus, assembling a representative tagset, and annotating the given corpus with the given tagset. 3.3.1 The corpus sample The starting point of our corpus analysis was the Corpus of Contemporary American English (COCA) 12 which is possibly the best online corpus of 12 There are two reasons why a corpus of American English was used in combination with dictionaries of British English: firstly, the COCA corpus is probably the largest and the most representative corpus of the English language (of use in the USA) and it is also far more optimized and user-friendly than its possible equivalent the British National Corpus (BNC) (which is in fact used as a source of data for the evaluation procedure given later on). Secondly, the dictionaries of British English were quite applicable to the uses of the verb look in American English as apart from some rather archaic usages (such <?page no="49"?> 49 general English, 13 though representative of the usage in the USA. Put together by the Brigham Young University and operational since 1990, it numbers over 425 million entries from spoken and written sources. The spoken part (76 million words) comes from transcripts of unscripted conversation from nearly 150 different TV and radio programs. The written texts come from various sources: fiction (70 million words) containing short stories and plays from literary magazines, children’s magazines, popular magazines, the first chapters of first edition books 1990 to the present, and movie scripts; popular magazines (78 million words) from nearly 100 different periodicals, with a good mix (overall, and by year) between specific domains (news, health, home and gardening, women, financial, religion, sports, etc.); newspapers (73 million words) gathered from ten newspapers from across the US, with a good mix between different sections of the newspapers, such as local news, opinion, sports, and financial; and academic journals (73 million words) (Davies 2011). What puts it above other such corpora is the fact that firstly, all of its features are freeware and, more importantly, the way in which the search options and the displayed results are organized. Every corpus search can be customized with various options, such as the type and the register or alignment of collocates regarding the word in question. Likewise, the results obtained are also displayed comprehensively, across registers, types, and time periods, in percentages and charts, very clear and user friendly. The corpus is part of speech (POS) tagged using the CLAWS tagger 14 ensuring POS searchability and up to 97% POS tag accuracy (Garside 1987: 119). The full concordance of the lexeme look, in its entire paradigm, comes up to 271,160 tokens. Out of the full corpus sample 27,116 instances (10% of the full sample) of look were selected by random sampling 15 - 15,645 of which were verbs and 11,471 nouns (including multi-word expressions). Out of the full selection of verbs the final sample, after eliminating wrongly as for example ‘look something/ someone out’ meaning ‘search for and find something or someone’) being designated as exclusively of British English use, all other senses were indicated as fully compatible in both varieties of English. 13 The criteria terming this corpus as the best online corpus of English are its representativeness, size, availability, and search features (for more see Dobri (2009a) or Dobri (2012)). 14 The CLAWS tagger is a system for automatically assigning a marker of the grammatical class to which this English word belongs to each word in a text being processed within a corpus (for more on this tagger see Garside (1987)). 15 The random sample numbers were obtained using the RANDOM.ORG program for generating random numbers (http: / / www.random.org). <?page no="50"?> 50 POS tagged examples and duplicate sentences ended up at 13,119 sentences containing the verb look in all of its forms. 3.3.2 The verb look The first and the most problematic (both theoretically and practically) tag to be set up within step 2 of the analysis is the semantic marker of the sense itself. It is also one of the most crucial tags since it represents the basis of all of our training data. Two issues arose: the make-up of the list of word senses of the verb look required for the training procedure; and the necessary level of sense granularity. Since the whole exercise was done for the purposes of exemplifying and evaluating a particular methodology of semantic analysis, it was easy to answer the latter question and decide on a finely grained selection of senses. Such sense granularity would give us a more detailed insight into the nuances of the model at hand. To obtain the selection itself, however, two new issues had to be considered: firstly, it is in practical terms quite impossible to capture all of the possible interpretations and productive uses of a lexeme in any constructed list of senses (Kilgarriff 2006); and secondly, any enumerative approach to word senses involves using one of several equally subjective sources namely either dictionaries (paper or machine-readable), parallel corpora, or native speaker intuition (Ide and Wilks 2006). Since contemporary lexical semantic theory offers no more elegant solutions, the decision was taken to select and paraphrase the senses based on several more prominent monolingual dictionaries of English (New Oxford English Dictionary, Oxford English Dictionary, and The Oxford Dictionary of Idioms). One of the reasons why dictionary entries have been chosen instead of some of the other listed means of sense distinction is that they are all equally subjective and theoretically flawed. Another reason is that using dictionaries (paper or electronically readable ones) is not very time-consuming and also most commonly employed within the lexical semantic research community. Combining and paraphrasing definitions from the said dictionaries we extrapolated and paraphrased 42 different senses 16 of look as a verb (includ- 16 Only contemporary senses of the verb look were chosen for the analysis while all of the ones listed as archaic or obscure (found mostly in the OED) were not taken up due to the very small probability of finding them in a synchronic corpus and due to their obvious disuse in communication. <?page no="51"?> 51 ing its multi-word idiomatic expressions) 17 and all are listed in Table 3.1 (the first column marked as C represents the code of the senses which is later employed in more complex tables and figures due to constraints of space). Presented in this way, the given list of senses is nothing more than a relatively subjective account of all the possible meanings of the verb look (or rather their paraphrases). It was altogether compiled by expert lexicographers and as such based on corpus data and solid methodology, but it is still without any systematicity and replicable objectivity of criteria used originally in their demarcation. It is also ultimately only a list of senses, not providing any data on their organization, their interrelatedness or the contextual factors that condition them. Hence, it is only to be used as a starting point in the further semantic analysis, functioning as basically one more type of tag in our tagset. 3.3.3 The tagset In order to represent the (linguistic) microcontext in which a particular sense appears, a wide array of possible tags was devised. Apart from this contextual representativeness as a guideline, another criterion for the selection of possible tags we mentioned in the introduction was their ease of identification as their tagging was to be done by the people (advanced university students of English) working on the CASIS project (which housed the research presented in this book and will be described in detail further on). 17 The analysis of the multi-word expressions containing the verb look was also limited to the non-compositional meanings so expressions such as for example ‘look over’ or ‘look through’ were not analyzed as they can be quite easily interpreted from their constituent meanings in a compositional manner. <?page no="52"?> 52 Table 3.1 Senses of the verb look compiled from dictionaries. C SENSE 1 direct your gaze towards someone or something or in a specified direction 2 for a structure or part of body to have a view or outlook in a specified direction 3 express or show something to someone 4 ignore someone or something by pretending not to notice them 5 inspect something briefly 6 peruse a book or other written material 7 move round in order to investigate something 8 think of or regard in a particular manner 9 examine and consider 10 investigate in great detail 11 attempt to find 12 evaluate someone or something with a quick glance 13 have the appearance of 14 show likelihood of 15 appear your usual self 16 rely on someone or something 17 expect (hope) to do something 18 have the appearance of being as old as you are 19 express a perceived air of superiority 20 view someone with superiority 21 observe someone without showing embarrassment or fear 22 ignore wrongdoing 23 make future plans 24 evaluate someone carefully 25 take care of someone or something 26 reminisce past events 27 suffer a setback 28 eagerly await something or someone 29 pay a short visit 30 observe an event without getting involved 31 quickly take notice 32 bring an improvement to a situation 33 pay a social visit while going somewhere else 34 search for and find particular information in a piece of writing 35 have respect for someone 36 have a downcast or mournful look 37 express emotion (anger) by look or glance (look daggers at) 38 suggest to someone to be quick at doing something (look alive; look lively; look sharp) 39 do not act before considering the possible consequence (look before you leap) 40 appear weak or unimportant (look small) 41 search for and produce something (look something out) 42 question the quality of a gift or a favor received (look a gift horse in the mouth) <?page no="53"?> 53 As presented in Table 3.2, grammatical markers of tense, aspect, and voice were selected as representative yet easily attestable features of the microcontext and the lexeme’s distribution. Other syntactic tags are also of the traditional persuasion together with collocates which have a confirmed influence on meaning. The last ontological system used in the analysis is found in the form of semantic tags (including the already mentioned senses and other semantic denominations such as animate, inanimate, concrete, abstract, and more). All in all, every one of the 13,119 sentences from the corpus sample has been manually attested for 115 predefined and up to 85 non-predefined (collocates one or two places to the left and to the right of the target lexeme and frequently co-occurring subordinators) statistically relevant contextual features (Table 3.2) and 42 predefined dictionary senses (of which only 35 were found). COUNT PREDEFINED FEATURES 1 Tense: Present simple 2 Tense: Present simple 3rd 3 Tense: Present progressive (person) 4 Tense: Past simple 5 Tense: Past progressive (person) 6 Tense: Pes. Perfect (person) 7 Tense: Present Perfect Progressive (person) 8 Tense: Past Perfect 9 Tense: Past Perfect Progressive 10 Tense: will + inf 11 Tense: will Progressive 12 Tense: will Perfect 13 Tense: will Perfect Progressive 14 Tense: going to 15 Tense: modal present (verb) 16 Tense: modal past (verb) 17 Tense: Present Participle 18 Tense: Past Participle 19 Tense: to infinitive 20 Aspect: simple 21 Aspect: progressive 22 Aspect: perfect 23 Aspect: perfect progressive 24 Voice: active 25 Voice: passive 26 Verb: intransitive 27 Verb: transitive (intentionality/ quasi transitive) 28 Verb: complex transitive (intentionality/ quasi transitive) 29 Verb: linking verb <?page no="54"?> 54 30 Sentence/ clause: declarative 31 Sentence/ clause: interrogative 32 Sentence/ clause: imperative 33 Sentence/ clause: main 34 Sentence/ clause: subordinate (with/ no subordinator) 35 Sentence/ clause: relative subordinate 36 Subject: human 37 Subject: animate 38 Subject: inanimate 39 Subject: concrete 40 Subject: countable 41 Subject: uncountable 42 Subject: abstract 43 Subject: proper 44 Subject: machine 45 Subject: location 46 Subject: quantity 47 Subject: possessive 48 Subject: expletive 49 Subject: simple 50 Subject: compound 51 Subject: phrase 52 Subject: clause 53 Subject: singular 54 Subject: plural 55 Subject: pronoun 56 Head: human 57 Head: animate 58 Head: inanimate 59 Head: concrete 60 Head: countable 61 Head: uncountable 62 Head: abstract 63 Head: proper 64 Head: machine 65 Head: location 66 Head: quantity 67 Head: possessive 68 Head: expletive 69 Head: simple 70 Head: compound 71 Head: phrase 72 Head: clause 73 Head: singular 74 Head: plural 75 Head: pronoun 76 Object: human <?page no="55"?> 55 77 Object: animate 78 Object: inanimate 79 Object: concrete 80 Object: countable 81 Object: uncountable 82 Object: abstract 83 Object: proper 84 Object: machine 85 Object: location 86 Object: quantity 87 Object: possessive 88 Object: expletive 89 Object: simple 90 Object: compound 91 Object: phrase 92 Object: clause 93 Object: singular 94 Object: plural 95 Object: pronoun 96 Complement: human 97 Complement: animate 98 Complement: inanimate 99 Complement: concrete 100 Complement: countable 101 Complement: uncountable 102 Complement: abstract 103 Complement: proper 104 Complement: machine 105 Complement: location 106 Complement: quantity 107 Complement: possessive 108 Complement: expletive 109 Complement: simple 110 Complement: compound 111 Complement: phrase 112 Complement: clause 113 Complement: singular 114 Complement: plural 115 Complement: pronoun COUNT NON-PREDEFINED FEATURES (STATISTICALLY RELEVANT) 116 1 L1 collocate: after 117 2 L1 collocate: and 118 3 L1 collocate: at 119 4 L1 collocate: but 120 5 L1 collocate: by 121 6 L1 collocate: for 122 7 L1 collocate: forward <?page no="56"?> 56 123 8 L1 collocate: he 124 9 L1 collocate: how 125 10 L1 collocate: in 126 11 L1 collocate: it 127 12 L1 collocate: just 128 13 L1 collocate: like 129 14 L1 collocate: she 130 15 L1 collocate: that 131 16 L1 collocate: to 132 17 L1 collocate: up 133 18 L1 collocate: what 134 19 L1 collocate: when 135 20 L1 collocate: which 136 21 L1 collocate: who 137 22 L2 collocate: at 138 23 L2 collocate: it 139 24 L2 collocate: out 140 25 L2 collocate: that 141 26 R1 collocate: after 142 27 R1 collocate: and 143 28 R1 collocate: around 144 29 R1 collocate: as 145 30 R1 collocate: at 146 31 R1 collocate: away 147 32 R1 collocate: back 148 33 R1 collocate: down 149 34 R1 collocate: for 150 35 R1 collocate: forward 151 36 R1 collocate: from 152 37 R1 collocate: if 153 38 R1 collocate: in 154 39 R1 collocate: into 155 40 R1 collocate: it 156 41 R1 collocate: like 157 42 R1 collocate: on 158 43 R1 collocate: out 159 44 R1 collocate: over 160 45 R1 collocate: that 161 46 R1 collocate: through 162 47 R1 collocate: to 163 48 R1 collocate: up 164 49 R1 collocate: what 165 50 R2 collocate: around 166 51 R2 collocate: as 167 52 R2 collocate: at 168 53 R2 collocate: for 169 54 R2 collocate: from <?page no="57"?> 57 170 55 R2 collocate: how 171 56 R2 collocate: if 172 57 R2 collocate: in 173 58 R2 collocate: into 174 59 R2 collocate: like 175 60 R2 collocate: on 176 61 R2 collocate: over 177 62 R2 collocate: through 178 63 R2 collocate: to 179 64 R2 collocate: up 180 65 R2 collocate: what 181 66 Subordinator: after 182 67 Subordinator: and 183 68 Subordinator: as 184 69 Subordinator: at 185 70 Subordinator: but 186 71 Subordinator: for 187 72 Subordinator: from 188 73 Subordinator: how 189 74 Subordinator: if 190 75 Subordinator: in 191 76 Subordinator: like 192 77 Subordinator: that 193 78 Subordinator: then 194 79 Subordinator: through 195 80 Subordinator: to 196 81 Subordinator: what 197 82 Subordinator: when 198 83 Subordinator: which 199 84 Subordinator: while 200 85 Subordinator: who Table 3.2 The tagset used in the analysis of the verb look. 3.3.4 Setting up the analysis Apart from the POS division (into the verb and noun sample for the purposes of extracting only the verbs) provided automatically by the COCA corpus search, all tags we have listed were manually attested. The annotation procedure started, after the tagset has been defined, with every sentence being manually analyzed for all of the given features. The predefined features were to be attested as present or absent while it was an open search for any related non-predefined co-occurring elements. This manual analysis, representing step 3 of applied corpus-based semantics, was con- <?page no="58"?> 58 ducted using the CASIS Tagger which was developed as a part of the said CASIS project. The CASIS (Computer-Aided Sense Identification System) research project was funded by the Forschungsrat (Science Council) and the Sondermittel der Fakultät für Kulturwissenschaften (Special Funds of the Faculty of Humanities) of the Alpen-Adria-Universität Klagenfurt over the course of the last three years. It represents an attempt to facilitate the manual processing of a large corpus sample, to enable easier data extraction following the annotation, to enable word sense disambiguation, and to provide for practical evaluation. For such a purpose the project involved cooperation with a computer expert Thomas Hainsho who oversaw the programming construction of the data base, web page, the tagger, and the sense disambiguator. The initial set-up stage resulted in the creation of two corpora: CASIS Corpus and CASIS Example Corpus. The CASIS Corpus consists, as we already described, of 13,119 sentences extracted randomly from COCA; the CASIS Example Corpus consists of 1,392 random sentences sampled from the British National Corpus (in both samples containing the verb look of course) and is intended, as will be described in more detail further on, for evaluation purposes. At this stage, however, the first task of the analysis was to manually annotate all of the microcontexts in the COCA Corpus. Four advanced students studying English at the Department of English and American Studies at the Alpen-Adria-Universität Klagenfurt, after being additionally trained, were responsible for the task: Alexandra Galler, Verena Novak, Pamela Prohaska, and Tjaša Žemlja. Each of them covered a section of the sample and was charged with identifying the given tags and inputting them electronically via the specially designed form in the prescribed corresponding code. 18 Once the tags had been entered for each microcontext, the statistical output could be generated in the form of Microsoft Excel tables. The tables would show counts and correlates of the given tags with the senses they had been identified with, as well as a recording of the frequency of the senses themselves. Such output gave us all the necessary information in order to start with steps 4 and 5 and gauge the extent and nature of contextual influence on sense distinctness, sense prototypicality, and the lexical network construction of the senses of the verb look. 18 The actual tagging and WSD within the CASIS project can be attempted by any interested parties by logging in at http: / / casis.uni-klu.ac.at/ casis_example/ users/ login - username/ email: guest1@test.at, password: guest1. <?page no="59"?> 59 4. Corpus-based semantics and sense distinctness Having explicated the first three steps of any corpus-based semantic analysis, the following three chapters represent steps 4 and 5 applied to the three discussed aspects of polysemy. These two steps imply using the distributional profile of the verb look obtained through steps 1, 2, and 3 and applying it to sense distinctness (Chapter 4), the prototypicality of senses (Chapter 5), and the construction of sense networks (Chapter 6). This chapter is concerned with sense distinctness as perhaps the most central issue of polysemy. It focuses on the ways of distinguishing between different readings of a polysemous word and how distributional information can contribute to the given process. It also addresses the lack of perceivably objective and quantifiable criteria for making either finely-grained or coarsely-grained sense distinctions. 4.1 Sense distinctness To reiterate once more, the main problem with distinguishing between different senses of a polysemous word is that there seems to be an absolute lack of any objective and quantifiable criteria of doing so. This is always striking if we bear in mind the apparent ease with which some other (natural) sciences describe the core phenomena of their respective fields of interest and also the existence of a myriad of approaches dealing with the issue of word sense disambiguation. It seems logical that the first stage of our research will be to have a good look at what these various linguistic approaches of discriminating between senses propose and understand what the problems that plague them are. The discussion that may open up could point us in the right direction in addressing the problem more successfully within our defined framework of corpus-based semantics. 4.1.1 Human sense disambiguation One of the most traditional, most time consuming, and most commonly used ways of making sense distinctions is harnessing the power of native speaker intuition. This method of identifying senses involves selecting a sociolinguistically representative sample of (usually) native speakers of a given language and asking them to annotate a certain number of sentences containing the target word for senses. The informants should state the <?page no="60"?> 60 meaning of the target word they recognize according to their (nativespeaker) intuition. Such an approach to sense identification can still be seen as central in contemporary lexicography only the intuition-based results are more empirically supported by corpus data and the native-speaker informants are replaced by trained lexicographers. The major problem human sense disambiguation cannot possibly overcome is the fact that native speaker intuition (lexicographers included) is nowhere nearly as highly to be ranked when it comes to semantics as it is regarding syntax since it produces too much variation. Additionally, lexicographers have to decide on the number and layout of distinct senses they include in a dictionary, which they normally do on the basis of practical expediency and/ or pedagogical considerations which is not the case in actual language use. Another problem is that the procedure is quite time-consuming and costly bringing with it the common methodological issues regarding the amount of corpus data that can be considered. The last pressing problem is the apparent subjectivity of the methodology which, even in the best cases when conducted by experienced lexicographers, still relies on no tangible criteria and produces different results even when confined to the practice of lexicography alone. In an attempt to remedy these drawbacks (especially the time-consuming one), a variety of non-human based methods have been developed. 4.1.2 Computer-based sense disambiguation Starting in the 1950s and 1960s, the development of computers and language corpora semantics and lexicography bore witness to a veritable Big Bang of computer-based methods for dealing with sense disambiguation. Following different ideas of what constitutes meaning in a formal manner and different paths of quantifying and processing the formal aspects of sense, several major models centering on IT can be recognized (Ide and Veronis (1998); Navigli (2009)). 4.1.2.1 Early machine translation approaches The first attempts at machine assisted sense disambiguation stem from the famous Weaver memorandum (Weaver 1949/ 1955). Following the major cryptographic developments of World War II, the memo emphasized the need for developing a system for automatic MT. The first research into MT, focusing mostly on technical terminology (for example Kaplan (1955); Masterman (1961); or Reifler (1955)), showed reasonable success in disambigua- <?page no="61"?> 61 tion. It pointed out the major issues in WSD such as the role of the context, the need for developing predefined lexicons, and the need for knowledge representations (Ide and Veronis 1998: 5). The main problems with this early development of computer-based WSD were the restricted accessibility of both available lexical resources and the computational abilities of the computers of the day. For this reason these models only worked well with limited terminology. In expectance of further technological and linguistic advances, the IT baton was given to artificial intelligence (AI) attempts. AI methods represented a whole new view of language processing where WSD was only one part of the overall scheme of language understanding these systems proposed to encompass (Ide and Veronis 1998: 6). As a rule all of the AI models were based on some kind of modeling of human understanding whether through exploiting symbolic systems such as semantic networks (Masterman 1961) or frames (Hayes 1977) or by employing psycholinguistic priming and the activation of the semantic networks they produce (Meyer and Schaveneveldt 1971; Cottrell and Small 1983; Waltz and Pollack 1985). However, the AI based approaches also had their problems as they were extremely time-consuming to set up and again were of practical use only in very limited (usually technical) domains (Ide and Veronis 1998: 8). The further development of WSD systems depended then on finding other sources of knowledge which would allow for the automatic extraction of the said knowledge (Navigli 2009: 2). One such source was found in large lexical databases such as machine-readable dictionaries and language corpora. The shift towards empirical (statistical) methods in WSD started in the 1980’s with the appearance of first machine-readable dictionaries. The primary focus was to extract sufficient amounts of knowledge from the dictionaries, the approaches varying in their perception of what knowledge is and how much of it is sufficient (e.g. Chodorow, Byrd and Heidon 1985; Wilks et al. 1990; Lesk 1986). One thing the methods had in common was that they usually centered on the definitions of the target word and looked for the frequency of the target words as conforming to one or other definition. Further expansion of that data came through including other sources of information such as for instance using subject codes of ANIMATE, HU- MAN, etc. (Ide and Veronis 1998: 10). Machine-readable thesauri provided access to information on the relationships between words (in this case synonymy). The problem with dictionaries and thesauri was that they were primarily designed for human use and were hence not really good as a <?page no="62"?> 62 source for the automatic extraction of semantic knowledge (Ide and Veronis 1998: 12). To fully facilitate the use of computer processing, the 1980s also saw the emergence of computational lexicons. 19 Many WSD approaches have been based on lexicons, and perhaps the greatest numbers of individual attempts have been produced using WordNet. WordNet combines the features of a dictionary by providing individual senses of a target word, but also adds its basic feature of a synset containing all of the synonym words surrounding a given lexical concept (Felbaum 2005: 667). The synsets are hierarchically organized and further interlinked through a variety of additional semantic relations such as hyperonymy/ hyponymy, antonymy, and meronymy. Such an abundance of semantic knowledge coupled with its free availability made it attractive for a myriad of sense identification experiments (as for example the whole range of the SENSEVAL projects (Edmonds and Kilgarriff 2002)). The reported problem of using WordNet in WSD is the fact that the senses are too finely grained and do not allow for a decision as to how much of a sense distinction one needs for a particular use. Generative lexicons, which do not provide specific senses but allow for generation from rules which are meant to capture generalities in sense creation, have not been exploited as much which is due to the fact that they are extremely hard to produce and are not so freely available for use on a larger scale (Ide and Veronis 1998: 13-14). They are, however, relatively close to our model as they revolve around the dynamic idea of the underspecified description of semantic information. 4.1.2.2 Corpus-based methods Even more akin to our proposed methodology are empirical statisticsbased methods, basically starting with the focus on the statistical tendencies of language and on the collocational distribution of words (amongst the earliest being e.g. Zipf (1935) or Fries and Traver (1940)). Electronic corpora appeared in the 1960s (the Brown corpus (Kucera and Francis 1967)) spawning some interesting early lines of investigation into WSD (such as for instance Stone (1969); Weiss (1973); or Kelley and Stone (1975)). However, their use and the use of statistical empirical methods did not really take off till mid-1980s because of the strong influence of language 19 They can be divided into enumerative ones (such as WordNet (Miller et al. 1990)), with senses explicitly listed, and generative ones (for example CORELEX (Pustejovsky et al. 1995)), with underspecified semantic information based on links to other words. <?page no="63"?> 63 formalism in linguistics at the time. Advances in technology and the theoretical shift towards maximalist semantics brought about an interest in a whole array of uses of KWICs (key word in context) in semantics. Most methods based on language corpora (such as, for example, the ICAME projects (Atwell 1986; Leech and Fligelstone 1992; Collier and Pacey 1997) or AVIATOR and ACRONYM projects (Renouf 1996)) center on supervised learning which in essence means that WSD systems work on the basis of previously manually (or semi-automatically) processed knowledge (Ide and Veronis 1998: 14). Such processing is, however, extremely costly (in terms of time and man-hours) so various attempts at solving this problem appeared - such as bootstrapping methods for automatic sense tagging (Hearst 1991) or other means of circumventing the manual processing of texts (Schütze 1993; Brown et al. 1991; Dagan and Itai 1994; Yarowsky 1992). Yet, the results are not promising because the results show that even a minimal input of supervised manual processing drastically improves the accuracy of the tagging when compared to the unsupervised automatic method (Ide and Veronis 1998: 17). The issue of data sparseness and underspecification is also a major one and it is usually tackled by employing a whole arsenal of statistical methods - such as smoothing, class-based modeling (Brown et al. 1992; Resnik 1992), similarity-based modeling (Dagan et al. 1994; Grishman and Sterling 1993), and many other general and specialized ones. Many of these issues are directly explored in our research to follow. 4.1.2.3 Problems of computer-based approaches All of the listed methodologies, regardless of the knowledge source they may be based on, have to face several crucial issues on the road towards a criteria-based sense distinction (Ide and Veronis 1998: 18-27): the quantification of the context: as we suggested in the previous lengthy discussion, the most feasible way of objectively discriminating between senses of a polysemous word seems to be through the analysis of the contexts it appears in; and objective evaluation of sense disambiguation: the problem of the meaningful and representative evaluation of one particular attempt at WSD is difficult to solve. A comparison of the various methodologies we listed would not say much about their actual objective successfulness but rather about their mutual differences. <?page no="64"?> 64 Hence, creating an evaluation test that could handle each model as a veritable semantic litmus test is a challenge indeed. The methodology investigated in our work, as the name itself might suggest, is in essence an elaborate corpus-based method broken down and performed entirely manually in order to fully investigate and record minutely its semantic reach. The default premise involves, of course, addressing these two cumulative issues of the quantification of the context and of the objective evaluation of the results. 4.1.3 The fully corpus-based disambiguation Having reviewed the available approaches to discerning one reading of a polysemous item from another we can say that one plausible manner of solving the problem of sense distinctness lies in the more radical corpusbased approaches. The proposition is to do away completely with predefined senses and have the corpus citations and the context within them display their own version of how meanings should be discriminated (Schütze 1998). If we accept, following our lengthy discussion on the nature of lexical meaning in Chapter 2, that a word sense corresponds to a given cluster of corpus citations all containing the same reading of a given lexeme (representing its salient context of use), two noted issues need to be decided on. We saw that one is which criteria to use in order to represent the fact that they really contain the same reading of the given lexeme. The other is the issue of how to properly and objectively evaluate and confirm the sense distinctions obtained and defined in such a manner. The criteria to be used in order to define corpus citation clusters will be the microcontextual information on the distributional properties of each reading. Two major divisions of such approaches as we described can be recognized - Bag-of-words approaches and Relational information approaches (Ide and Veronis 1998: 18). The Bag-of-words approaches, to which our corpus-based semantics belongs, look at a certain window of words around the target lexeme and investigate their relationships with it in terms of semantic and syntactic interplay, distance or collocational ties. Most WSD work uses, as we also intend to, the microcontext as a window of the words it analyses, focusing primarily on the local surroundings of maximally the sentence in which the target word appears (as in Atkins (1987); and Leacock et al. (1998)). Some approaches employ the usage of a wider topical context which encompasses words which co-occur with a <?page no="65"?> 65 given reading of the target lexeme within a distance of several sentences or even paragraphs, all seen as dealing with the same identified topic (such as in Yarowky (1992); or Voorhees et al. (1995)). Sometimes an even wider topical area is taken as the context window comprising a whole domain, stemming directly from the microglossaries employed in early MT attempts (Ide and Veronis 1998: 19). Research shows that for the purposes of WSD the use of topical context is not superior to the use of the microcontext alone (see Leacock et al. 1998) but should perhaps be seen only as a different extension of the contextual window to investigate. In essence, the model we propose is a statistical (corpus-based) Bag-of-words approach to semantic analysis focusing on the microcontext. The information necessary for our analysis has already been obtained previously through the described steps 1, 2, and 3 of our corpus-based analysis (given in Chapter 3). This information will be analyzed in terms of meaningful patterns indicating solid distributional ties between the microcontext and the 42 (or as we will see 35) readings of the verb look. Evaluation of the actual meaningfulness of the patterns identified in such a way will be twofold: one facet will be the traditional qualitative analysis of the disambiguation results; the second is a quantitative facet and it involves utilizing standard computer-linguistic statistical models and a standard invitro word sense induction and disambiguation task based on an independent but equally annotated corpus (the CASIS Example Corpus) together with the comparison with the inter-annotator agreement (ITA) statistical boundary (Navigli 2009: 44-51). 4.2 Qualitative analysis Outlined in Table 3.1, the starting point for debating the many senses of the verb look are the 42 different finely-grained dictionary senses. A more indepth qualitative analysis of the actual structure and choice is necessary before the results of the corpus-based investigation can be discussed and compared. The first important thing to note here is that out of the 42 senses which emerged as a result of combining and paraphrasing various dictionary entries the following 7 have not been identified in the corpus sample at all: ‘have a downcast or mournful look’; ‘express emotion (anger) by look or glance’ (look daggers at); <?page no="66"?> 66 ‘suggest to someone to be quick at doing something’ (look alive; look lively; look sharp); ‘do not act before considering the possible consequence’ (look before you leap); ‘appear weak or unimportant’ (look small); ‘search for and produce something’ (look something out); and ‘question a gift or a favor received’ (look a gift horse in the mouth). This fact already gives us a certain indication about the limitations of the methodology and will as such be picked up further on in the discussion. At this point however, due to the fact that they do not appear in the data, the 7 senses which have not been attested in the corpus sample will not be given further attention until the evaluation stage at the very end. Looking at the senses that have in fact been attested, there are several of them which can be seen as more crucial, if we were to observe the meaning of look in a more coarsely-grained way. The most prominent of them is perhaps ‘direct your gaze towards someone or something or in a specified direction’ given here in example [8]: 20 Example [8] Then she looks at the teacher, who also wears glasses. The reason why this sense can be considered as more important than other readings, if we can state it like that, is that it can be recognized as semantically related to a number of other senses, or rather to all of them. The first sense that can be seen as directly connected to the ‘direct your gaze towards someone or something or in a specified direction’ sense is ‘for a structure or part of body to have a view or an outlook in a specified direction’. It appears to be basically an extension of the original meaning requiring this time an inanimate subject, usually a structure (if we can consider a part of the body as such also) or a location of some kind (seen in example [9]). Example [9] […] shows the Boca Tigris [the Pearl River delta] looking south, with a ship exiting the river. 20 All of the examples that follow have been taken from the original corpus sample used in the presented analysis of the verb look. <?page no="67"?> 67 Other more transparent extensions of the ‘direct your gaze towards someone or something or in a specified direction’ sense are ‘ignore someone or something by pretending not to notice them’ (example [10]); ‘observe an event without being involved’ (example [11]); and ‘think of or regard in a particular manner’ (example [12]), the last of which involves directing a supposed mental gaze (or scrutiny) rather than the physical one: Example [10] […] his eyes didn't focus but seemed to look through me into the whirling blades of the fan in the far corner. Example [11] The emcee thrusts the microphone at her, looking on in amusement as she speaks into it breathlessly. Example [12] Looking at the mitigating factors, it seems to me that the only thing […]. Metaphorical extensions of the ‘direct your gaze towards someone or something or in a specified direction’ sense are also, for the most part, relatively transparent in their obvious link to it. They incorporate the original kernel meaning of ‘directing your gaze’ towards someone or something, if only metaphorically. These senses include: ‘express a perceived air of superiority’ (seen in example [13]) extended from the original meaning by directing your gaze within the conceptual mapping of OBSERVING IS JUDGING; ‘view someone with superiority’ (example [14]) which follows the same pattern as the previous sense; ‘observe someone without showing embarrassment or fear’ (example [15]); ‘ignore wrongdoing’ (example [16]); ‘make future plans’ (example [17]); ‘expect (hope) to do something’ (example [18]); ‘eagerly await something or someone’ (example [19]); ‘reminisce past events’ (example [20]); ‘suffer a setback’ (example [21]); ‘pay a short visit’ (example [22]); ‘quickly take notice’ (example [23]); ‘have respect for someone’ (example [24]); ‘rely on someone and something’ (example [25]); ‘bring an improvement to a situation’ (example [26]), ‘express or show something to someone’ (example [27]); and also <?page no="68"?> 68 ‘take care of someone or something’ (example [28]) being perhaps extended from the core sense through OBSERVING IS GUARD- ING conceptual mapping. Example [13] He lifted his weak chin to look down his nose at me or perhaps to mime nobility. Example [14] […] while applied research is often looked down upon as mere “tool building”. Example [15] Instead, we can only peer in from the countryside, look our subject straight in the face and wonder, ever wonder. Example [16] A classic example was with the Taliban - they started with violence against women, and everyone looked the other way... Example [17] Any movement for social change looks to a better future, but it makes its case by […]. Example [18] But, but you said this week you thought there were indications he was looking to get out. Example [19] […] and the city of Chicago has looked forward to furthering our relationship with South Africa. Example [20] Looking back, as I often do, for all of my expertise […]. Example [21] Or one of those rappers who go Hollywood and never look back. Example [22] Before he left Alderton, Julian looked in on Dipper, who was now sharing his cramped, unwholesome quarter […]. Example [23] “Look out! ” cried Rowena. Example [24] She's an inspirational woman to look up to. Example [25] Citizens, looking to their comprehensive doctrines, view the political […]. Example [26] Things are looking up for Malden Mills, earnings have grown, and the company says it expects […]. Example [27] Jehanneh looked the question at me. Example [28] Ares looks after her, seized with the fear he often has that she is walking away […]. If we were then to organize the senses in terms of their mutual sematic similarity, the first cluster of senses, which we can say centers directly around the ‘direct your gaze towards someone or something or in a specified direction’ sense, would be as in Figure 4.1. <?page no="69"?> 69 Figure 4. 1 The first clustering of senses centering on ‘direct your gaze towards someone or something or in a specified direction’. 21 21 The field housing the ‘think or regard in a particular manner’ sense in Figure 4.1 above and similarly represented fields in subsequent Figures 4.2, 4.3, 4.4, and 4.5 are intentionally given with different broken lines and thus specially marked because they extend further into a new cluster and spawn other related senses. direct your gaze towards someone or something or in a specified direction for a structure or part of body to have a view or outlook in a specified direction observe someone without showing embarrassment or fear rely on someone or something have respect for someone eagerly await something or someone pay a short visit observe an event without being involved ignore someone or something by pretending not to notice them suffer a setback bring an improvement to a situation take care of someone or something expect (hope) to do something quickly take notice reminisce past events make future plans express a perceived air of superiority view someone with superiority ignore wrongdoing think or regard in a particular manner express or show something to someone <?page no="70"?> 70 The second clustering of senses, even though not completely unrelated to the previous cluster centering around the ‘direct your gaze towards someone or something or in a specified direction’ sense (it is foreseeably linked to this cluster via the ‘think of or regard in a particular manner’ sense) can be seen as congregating around ‘examine and consider’ (example [29]). The branching senses include: ‘move round in order to investigate something’ (example [30]); ‘inspect something briefly’ (example [31]); ‘evaluate someone or something with a quick glance’ (example [32]); ‘investigate in great detail’ (example [33]); ‘evaluate someone carefully’ (example [34]); and arguably ‘peruse a book or other written material’ (example [35]). Example [29] NATO is now looking at three or four different options […]. Example [30] All eight of them emerge, one by one, looking around them. Example [31] If you look over the last 20 years, this place is totally altered. Example [32] “Some do”, he said, looking at her quickly as if to gauge whether she would. Example [33] […] who have worked longest, looking into the possibility of a larger pattern of evasion, an effort to […]. Example [34] “You haven't slept“, Milt chided, looking me up and down from the other side of my screen door […]. Example [35] Kimbal takes out a cigarette and lights it, sucking in hard while looking through his notebook… […]. The third, also a related cluster (via the original ‘direct your gaze towards someone or something or in a specified direction’ to the first cluster and via ‘examine and consider’ sense to the second one), is the one centering around ‘attempt to find’ sense (example [36]). It includes two related senses: ‘search for and find particular information in a piece of writing’ (example [37]); and, once more arguably, ‘pay a social visit while going somewhere else’ (example [38]). Example [36] […] that this was the magic potion she had been looking for. Example [37] but then I started looking things up, what the hell. Example [38] […] I am so happy about the teachers and the nuns who looked me up after all these years. These two clusters are graphically represented by Figures 4.2 and 4.3. <?page no="71"?> 71 Figure 4.2 The second clustering of senses centering on ‘examine and consider’. Figure 4.3 The third clustering of senses centering on ‘attempt to find’. The last cluster (represented by Figure 4.4), congregates around the ‘have the appearance of’ sense (example [39]) and cannot be seen as directly related to the other three clusters, all interlinked through the ‘direct your gaze towards someone or something or in a specified direction’ sense. It direct your gaze towards someone or something or in a specified direction examine and consider attempt to find think or regard in a particular manner search for and find particular information in a piece of writing peruse a book or other written material direct your gaze towards someone or something or in a specified direction move round in order to investigate something investigate in great detail evaluate someone or something with a quick glance examine and consider inspect something briefly think or regard in a particular evaluate someone carefully peruse a book or other written material <?page no="72"?> 72 can be perhaps understood as a case of semantic reversal of the core sense into a meaning reflecting what that gaze can ultimately access and see. Other members of the cluster include ‘show likelihood of’ (example [40]); ‘appear your usual self’ (example [41]); and ‘have the appearance of being as old as you are’ (example [42]). Example [39] He looks like a movie star. Example [40] It looks like we will end up with a choice between two candidates […]. Example [41] She'd have looked more herself in them anyway. She doesn't look herself. Example [42] “You're beginning to look your age, “ Julian told him. Figure 4.4 The fourth clustering of senses centering around ‘have the appearance of’. The analysis presented above is a purely descriptive exercise in a subjective recognition of the semantic (mostly metaphorical) links between the multitudes of senses the verb look (all put together in Figure 4.5). direct your gaze towards someone or something or in a specified direction have the appearance of show likelihood of appear your usual self have the appearance of being as old as you are <?page no="73"?> 73 Figure 4.5 The total non-criteria based division of the senses of the verb look. direct your gaze towards someone or something or in a specified direction for a structure or part of body to have a view or outlook in a specified direction observe someone without showing embarrassment or fear rely on someone or something eagerly await something or someone pay a short visit observe an event without being involved ignore someone or something by pretending not to notice them suffer a setback bring an improvement to a situation take care of someone or something expect (hope) to do something quickly take notice reminisce past events make future plans express a perceived air of superiority view someone with superiority ignore wrongdoing think or regard in a particular manner express or show something to someone move round in order to investigate something investigate in great detail evaluate someone or something with a quick glance examine and consider inspect something briefly evaluate someone carefully peruse a book or other written material examine and consider search for and find particular information in a piece of writing pay a social visit while going somewhere else have the appearance of show likelihood of appear your usual self have the appearance of being as old as you are have respect for <?page no="74"?> 74 The previous deliberations and the information on the interactions and extensions of different senses of look and on their apparent hierarchical order of generality seen in Figure 4.5 are based on pure introspection, and will, as representative of the most commonly used manner of sense disambiguation, serve as a means of comparison and evaluation for our more formalized and empirically-based corpus linguistic account. 4.3 Quantitative analysis According to the theoretical background of corpus-based semantics (given in Chapter 2) and worded here once more, the semantic content of a lexeme can be and is (to a large degree) determined by the syntactic, semantic, and morphological context with which the readings of a lexeme co-occur with sufficient frequency. In practice it means that if we can observe and record a statistically significant linguistic pattern surrounding one particular sense of a lexeme and contrast it successfully with the contextual patterns in which other senses appear, then we can reverse the process and determine objectively any given sense of a lexeme based on the previously attested contextual distribution. Through steps 1, 2, and 3 of the corpus-based analysis (taken earlier in Chapter 3) several sets of such potentially patternrevealing data have been obtained. The first important set is comprised of the raw frequencies of occurrences of the senses themselves, given below in Table 4.1. The initial thing to notice, when looking at these raw frequencies of senses, is the enormous discrepancy between the most and the least frequently appearing senses and the apparent statistical irrelevance of the ones appearing only a few times, or even only once. The data sparseness problem at hand is caused by the persistent difficulty of obtaining enough data on less frequent (mostly idiomatic) senses from a corpus sample of practically any size due to their low frequency of occurrences. We intended to compensate for this already familiar problem in corpus-based semantic research (Ide and Veronis 1998: 24) by using a rather large and diverse corpus sample (having more than 15,000 sentences in the original raw sample) taken from a big representative corpus (COCA). However, Table 4.1 clearly shows that the attempt was not successful when it came to encompassing all of the supposed semantic scope of the verb look. What we are left with then in our further analysis is to concentrate on the data obtained on other criteria besides the frequency of occurrences of senses and see if they were affected by a similar problem. <?page no="75"?> 75 R SENSE FREQ. 1. direct your gaze towards someone or something or in a specified direction 5,742 2. have the appearance of 3,045 3. attempt to find 1,860 4. think of or regard in a particular manner 894 5. examine and consider 240 6. move round in order to investigate something 217 7. investigate in great detail 173 8. show likelihood of 165 9. eagerly await something or someone 129 10. rely on someone or something 124 11. take care of someone or something 110 12. evaluate someone carefully 105 13. view someone with superiority 38 14. peruse a book or other written material 37 15. ignore wrongdoing 37 16. inspect something briefly 34 17. quickly take notice 29 18. search for and find particular information in a piece of writing 27 19. for a structure or part of body to have a view or outlook in a specified direction 25 20. observe an event without getting involved 20 21. have the appearance of being as old as you are 18 22. have respect for someone 16 23. suffer a setback 11 24. make future plans 5 25. reminisce past events 4 26. appear your usual self 2 27. express a perceived air of superiority 2 28. bring an improvement to a situation 2 29. pay a social visit while going somewhere else 2 30. express or show something to someone 1 31. ignore someone or something by pretending not to notice them 1 32. evaluate someone or something with a quick glance 1 33. expect (hope) to do something 1 34. observe someone without showing embarrassment or fear 1 35. pay a short visit 1 Table 4.1 Ranked raw frequencies and ranks of occurrences of the 35 attested senses of the verb look. 22 22 The R in column one indicates rank. <?page no="76"?> 76 Table 4.2 Senses ranked by feature types. R SENSES FEAT. TYPE NO. OF SENSES 1. have the appearance of 181 3,045 2. direct your gaze towards someone or something or in a specified direction 176 5,742 3. attempt to find 154 1,860 4. think of or regard in a particular manner 144 894 5. examine and consider 129 240 6. investigate in great detail 114 173 7. eagerly await something or someone 107 129 8. take care of someone or something 104 110 9. show likelihood of 103 165 10. move round in order to investigate something 98 217 11. rely on someone or something 98 124 12. for a structure or part of body to have a view or outlook in a specified direction 83 25 13. evaluate someone carefully 80 105 14. view someone with superiority 72 38 15. peruse a book or other written material 68 37 16. ignore wrongdoing 65 37 17. search for and find particular information in a piece of writing 62 27 18. inspect something briefly 60 34 19. quickly take notice 57 29 20. have the appearance of being as old as you are 51 18 21. suffer a setback 47 11 22. have respect for someone 47 16 23. observe an event without getting involved 42 20 24. make future plans 30 5 25. pay a social visit while going somewhere else 24 2 26. express or show something to someone 23 1 27. evaluate someone or something with a quick glance 23 1 28. express a perceived air of superiority 23 2 29. reminisce past events 23 4 30. pay a short visit 22 1 31. expect (hope) to do something 19 1 32. bring an improvement to a situation 19 2 33. ignore someone or something by pretending not to notice them 18 1 34. appear your usual self 18 2 35. observe someone without showing embarrassment or fear 12 1 <?page no="77"?> 77 R SENSES FEAT. TOKEN NO. OF SENSES 1. direct your gaze towards someone or something or in a specified direction 119,314 5,742 2. have the appearance of 51,158 3,045 3. attempt to find 38,266 1,860 4. think of or regard in a particular manner 19,264 894 5. examine and consider 5,545 240 6. move round in order to investigate something 3,743 217 7. investigate in great detail 3,235 173 8. rely on someone or something 2,604 124 9. eagerly await something or someone 2,504 129 10. show likelihood of 2,341 165 11. evaluate someone carefully 1,936 105 12. take care of someone or something 1,727 110 13. view someone with superiority 846 38 14. peruse a book or other written material 784 37 15. inspect something briefly 648 34 16. ignore wrongdoing 645 37 17. search for and find particular information in a piece of writing 548 27 18. for a structure or part of body to have a view or outlook in a specified direction 535 25 19. quickly take notice 490 29 20. have respect for someone 374 16 21. observe an event without getting involved 302 20 22. have the appearance of being as old as you are 266 18 23. suffer a setback 187 11 24. make future plans 101 5 25. reminisce past events 48 4 26. express a perceived air of superiority 33 2 27. pay a social visit while going somewhere else 31 2 28. appear your usual self 29 2 29. bring an improvement to a situation 28 2 30. express or show something to someone 23 1 31. evaluate someone or something with a quick glance 23 1 32. pay a short visit 22 1 33. expect (hope) to do something 19 1 34. ignore someone or something by pretending not to notice them 18 1 35. observe someone without showing embarrassment or fear 12 1 Table 4.3 Senses ranked by feature tokens. <?page no="78"?> 78 4.3.1 Distinctiveness and predictive power As we saw before, a total number of 200 (115 predefined and up to 85 nonpredefined) contextual feature was used to describe the possible distributional patterns of the 35 attested senses of the verb look. They have, in turn, been attested with various senses as 2,396 types and as 257,649 tokens. CONTEXTUAL FEATURE TYPE WIDTH OF OCCURRENCE Voice: active 35 Syntactic features: declarative 35 Subject: concrete 35 Subject: countable 35 Subject: simple 35 Syntactic features: main 33 Aspect: simple 33 Syntactic features: transitive 33 Subject: human 33 Subject: animate 33 Aspect: perfect progressive 1 Subject: machine 1 Head: location 1 Head: simple 1 Head: compound 1 Head: phrase 1 Head: clause 1 Head: singular 1 Complement: machine 1 Complement: possessive 1 Table 4. 4 The ten most widely occurring and ten least widely occurring predefined feature types. 23 Tables 4.2 and 4.3 above give us frequencies and ranks of senses with the most types and the most tokens of attested contextual features. As the tables above show, both the number of types and the number of tokens of the attested contextual features (tags) seem to promise greater statistical relevance if only for 23 Tables 4.4 and 4.5 do not include the several predefined features which did not appear with any of the senses as they have a value of 0 in all related measurements and hence have no effect on the given analysis. <?page no="79"?> 79 the significant patterns of contexts they seem to be promising. However, a scratch under the surface is required in order to properly understand what this data is trying to tell us. There are two ways of observing the evident interplay between the contextual features and the readings. One perspective is illustrated by the example given below in Table 4.4. It tells us that some of the 200 relevant features appeared in the corpus sample at least once with every attested sense (the top 10 in Table 4.4) while some features appeared at least once with only one sense (the bottom 10 in Table 4.4). CONTEXTUAL FEATURE TOKEN FREQ. OF OCCURRENCE Voice: active 13,075 Aspect: simple 12,016 Syntactic features: declarative 11,938 Subject: countable 10,486 Subject: concrete 10,375 Subject: simple 10,209 Subject: animate 9,209 Subject: human 9,139 Syntactic features: transitive 8,993 Syntactic features: main 7,838 Complement: quantity 5 Object: machine 4 Head: location 3 Head: expletive 2 Subject: machine 1 Head: proper 1 Head: compound 1 Head: clause 1 Complement: machine 1 Aspect: perfect progressive 1 Table 4.5 The ten most frequently occurring and ten least frequently occurring predefined feature tokens. The other perspective on the relationship between the contextual features and the readings is given in Table 4.5. The information presented here indicates that some of the contextual features (such as ‘Voice: active’ or ‘Aspect: simple’) appeared as many as 13,000 times with various senses, while others occurred merely once or twice. <?page no="80"?> 80 What the tables want to communicate is that both the range of senses with which a given feature interplays and the total frequency in which it appears with those given senses play a role in determining a contextual plot one reading is habitually invited to fill. The reason why both dimesions of this interplay have to be observed together may be clearer if we elaborate on an example of two contextual features, such as for instance ‘Voice: active’ and ‘Head: compound’. The contextual feature ‘Voice: active’ appears with all 35 senses and it does so 13, 075 times. Hence, we can safely say that it interplays quite widely with different senses and in sufficient frequency. The frequency of its occurrence (or rather co-occurrence) contributes significantly to its influence in suggesting a meaningful pattern of the contextual distribution of a given sense (or senses) with which it was so frequently identified. However, its high level of interplay (appearing with each of the 35 identified senses) reduces that influence because it means that it cannot be seen as specifically marking any particular sense (or senses). On the other hand, the feature ‘Head: compound’ appears with only one sense and only one time. The bonus for the semantic influence this particular tag can be seen as having is definitely in the fact that it exclusively appears with only one reading. The factor that reduces this influence is its statistically almost insignificant frequency of appearance. It was identified only once in 13,119 sentences which can ultimately be interpreted as only a chance co-occurrence with one given reading and nothing more telling than that. It becomes obvious that in order to mark a feature as influential in defining a contextual pattern we must look for contextual features which appear with one sense, or as few senses as possible, and which also appear frequently enough with that one or those few senses. 24 The former dimension of this combination, the interplay value of that feature, we can call distinctiveness. Since there are 35 attested senses, a feature’s distinctiveness can have a positive value from the minimum 1 to the maximum of 35, or also 0 if it was a predefined feature which was not attested with any of the senses. Hence, distinctiveness is calculated by dividing the total number of senses (35) with the number of senses with which the given feature appears (from 0 to 35). Each feature then has one distinctive value to be considered for the whole range of senses (as the value relates to all of 24 The classification method suggested by this methodology is linked to computer linguistic principles of using the previously mentioned weighted context vectors (Navigli 2009: 26) or two-way matrices (Baroni and Lenci 2010: 4). <?page no="81"?> 81 them) and is obtained easily following the equation below (distinctiveness values for all 200+1 features are given in Appendix 5): Distinctiveness = The other factor in this combination of influences we mentioned is the frequency at which the given contextual feature appears with a given sense for which that influence is being measured. However, before having a peek at the frequency values it is important to realize here that it must be done separately for the predefined (115 of them plus the attested sense) and the non-predefined features. The predefined ones are at a fixed number for all of the senses meaning that every microcontext was attested for the presence or absence of all 115 of them. The non-predefined features vary depending on whether any were identified or not (and could go from 0 to 85). The number of a maximum of 85 non-predefined features co-occurring with one sense was obtained by eliminating less quantitatively relevant appearances of nonpredefined contextual features. Not every co-occurrence of two lexemes is to be seen as a meaningful link, as we saw that not every pattern is a pattern. The break-off point for considering a frequency of a co-occurring nonpredefined feature (L1, R1, L2 or R2 collocates and frequently appearing subordinators) as statistically relevant was that it had to have a minimum frequency of occurrence value equal to at least 10 percent of the token maximum measured for any non-predefined feature. In our case the maximum token frequency measured for one non-predefined feature was 4,725 occurrences of the supposed L1 collocate ‘at’. If we follow our statistical benchmark of 10% of the attested token maximum, it computes than that no nonpredefined feature under the frequency of 47.25 (or rather 47) will be considered as being relevantly linked to a given sense. This brings the number down from an actual total of attested 372 non-predefined feature types to up to 85 considered. That is why we must deal differently with the data regarding predefined and non-predefined contextual features. Since the predefined ones were obligatorily tested as positive (present) or negative, zero value (absent), in their case the absence also had to be seen as a building-block of the patterns influencing sense selection. Hence, all of them had to be considered, regardless of their (low or non-existent) frequency. With the non-predefined ones, a zero value and an absence of a feature such as a collocate or a subordinator tells us nothing as there was no predefined list for all of the senses. One special feature is the sense itself - it has its own distinctiveness at the maximum 35 being in each case unique, its own frequency values, and <?page no="82"?> 82 it serves as the corner-stone of the entire set of training data. Having defined the differences of observing frequency information for different types of contextual data, we can focus finally on the gathered data. The maximally attested frequency of type with one sense was somewhat lower than the possible 200 (or rather 201 if also we include the senses) - it was 181 (96 predefined features including the sense and 85 non-predefined feature types, all identified as co-occurring with the ‘have the appearance of’ sense. The total token value for the features appearing with one individual sense ranged from the maximum of 119,314 to only 12 (as we can see in Tables 4.2 and 4.3). To further statistically optimize the frequency data the influence of the total number observations of a feature across all senses must be taken into account so we will be using relative frequency 25 values in further calculations. 26 Now, as we saw in the previous paragraph, the idea is to find among these the features which appear in sufficient frequency but are also distinctive enough. The value of this combination of factors we will term the predictive power of a feature regarding each given sense (and are all listed in Appendices 6 and 7). Table 4.6 below illustrates the ways in which the two possible factors making up the predictive power can interplay, column one indicating the perceived strength of their semantic influence in conditioning a given reading. min low frequency low distinctiveness medium frequency low distinctiveness low frequency medium distinctiveness high frequency low distinctiveness low frequency high distinctiveness medium frequency medium distinctiveness high frequency medium distinctiveness medium frequency high distinctiveness max high frequency high distinctiveness Table 4. 6 The possible interplay of distinctiveness and frequency in constituting predictive power values. 25 Relative frequency is calculated by dividing the total token frequency of a feature with each individual frequency of its co-occurrence with each of the senses. 26 All of the statistical processing of the data presented in the paper was conducted using Microsoft Excel and SPSS V.20 with the support of Mag. Dr. Hermann Cesnik at the Alpen-Adria-Universität Klagenfurt. <?page no="83"?> 83 We can see from the illustrative scale above that the higher the frequency of co-occurrence of a contextual feature with a given sense and the higher the distinctiveness of that feature are, the stronger its contextual conditioning will be. Hence, we can easily deduce how to calculate the predictive power of any given microcontext - we simply multiply the two relevant factors of relative frequency and distinctiveness: Predictive power = Additionally, having the predictive powers measured across the total number of senses standardizes their values giving them a possible range of between 0 and 1 (0 being the minimum and indicating no predictive power and 1 being the maximum predictive power any contextual feature can have). Combining the values of the predictive powers of each of the attested contextual features and linking them to one given reading we obtain a quantifiable value of the total conditioning strength of any particular context surrounding the target lexeme. This combined value of predictive powers we can call the prediction ratio of a given context. It is calculated simply by adding all of the predictive powers of all of the features co-occurring with the individual sense at hand in a given context. The prediction ratios should be further modified by assuming that the 35 attested senses constitute a total possible polysemy spectrum of the verb look (for practical reasons, disregarding here the obvious semantic openendedness of any lexeme (Onysko 2011)). So, in effect, the prediction ratios we present in Table 4.7 below illustrate their values in contexts which would contain all of the 12 to 181 types and 12 to 119,134 features we originally attested for the 35 different attested readings of the verb look. Normally, in any practical disambiguation task the process of calculating a prediction ratio would be an on-line one (something similar to the one we supposed as present in human-based disambiguation) and would solely depend on the (linguistic) context and its type/ token make-up. The maximal possible prediction ratio given in the table in the last column refers to the ideal situation of every attested feature for every sense having a maximum predictive power of 1 (which is, in fact, a theoretical and practical impossibility). It serves as a measure of comparison for each prediction ratio we obtain and already indicates problems we will face when we embark on our disambiguation. <?page no="84"?> 84 R SENSES PRED. RAT. THEOR. MAX 1. have the appearance of 36 181 2. direct your gaze towards someone or something or in a specified direction 28.95 176 3. attempt to find 7.42 154 4. think of or regard in a particular manner 5.42 144 5. eagerly await something or someone 3.57 107 6. evaluate someone carefully 3.16 80 7. investigate in great detail 2.46 114 8. move round in order to investigate something 2.46 98 9. rely on someone or something 1.80 98 10. show likelihood of 1.78 103 11. examine and consider 1.62 129 12. peruse a book or other written material 1.46 68 13. take care of someone or something 1.19 104 14. view someone with superiority 0.77 72 15. for a structure or part of body to have a view or outlook in a specified direction 0.58 83 16. inspect something briefly 0.44 60 17. quickly take notice 0.32 57 18. have respect for someone 0.30 47 19. ignore wrongdoing 0.29 65 20. search for and find particular information in a piece of writing 0.25 62 21. observe an event without getting involved 0.23 42 22. have the appearance of being as old as you are 0.16 51 23. suffer a setback 0.10 47 24. reminisce past events 0.07 23 25. express or show something to someone 0.04 23 26. pay a short visit 0.03 22 27. appear your usual self 0.026 18 28. evaluate someone or something with a quick glance 0.025 23 29. make future plans 0.022 30 30. express a perceived air of superiority 0.016 23 31. bring an improvement to a situation 0.016 19 32. expect (hope) to do something 0.011 19 33. pay a social visit while going somewhere else 0.0005 24 34. ignore someone or something by pretending not to notice them 0.0003 18 35. observe someone without showing embarrassment or fear 0.0002 12 Table 4. 7 Ordered prediction ratios constrained within 35 senses alongside the maximal possible prediction ratios for each sense. <?page no="85"?> 85 4.4 Evaluation of the results for sense distinctness As we indicated in the previous chapter, for the purposes of in vitro evaluation of the results of disambiguation based on the given predictive powers and prediction ratios (Palmer, Ng and Dang 2006: 76), and ultimately the practical applicability of the corpus-based methodology on sense distinctness, the CASIS Example Corpus has been constructed as a standard form of a word sense induction and disambiguation task. This step 5 of corpusbased analysis consists of a test set comprising 1,392 random sentences all containing the lexeme look (both noun and verb). They were extracted from the BNC and annotated using the same set of contextual features as in the original training set (Table 3.2), the only difference being that these sentences have not been marked for senses. The original set of senses is only used as an external source of knowledge serving as a cross-reference and linked to the identified predictive powers. The decision list of the disambiguators program included the following steps: after identifying the POS of the lexeme look in the sentences (based on the presence or absence of the tag ‘Tense’), the program assigns each of the features in a given sentence with a predictive power in potentia (based on the previously calculated predictive powers given in Appendices 6 and 7). These provisional values of the predictive powers represent every possible value one feature can have in every context attested in the training set of the CASIS Corpus; the senses which have been identified as co-occurring with all or some of the features attested in the sentence at hand are then activated in the described reference list of 35 senses; the program then calculates the prediction ratios out of the predictive powers for each of the previously activated senses present online and offered up in the first step of the decision tree; and the last step involves the program putting forth an ordered list of prediction ratios and their correspondent readings - the top of the list indicating accurate sense disambiguation. To illustrate the procedure on an example we can take one sentence [43] out of the CASIS Example Corpus (coming originally from BNC): [43] He tossed his horse's reins to a groom and went storming off looking for Dacourt. <?page no="86"?> 86 Mimicking the original annotation procedure, all of the relevant contextual features (following the original tagset) have been manually marked. The subsequent annotation (copied from the corpus tagger used in both the training and evaluation data processing) shows how one of the mentioned student annotators processed the sentence from the example [43]: POS of look: verb Related Tags Tense: Past simple Aspect: simple Voice: active intransitive/ transitive/ complex transitive/ linking verb: transitive declarative/ interrogative/ imperative: declarative main/ subordinate/ relative subordinate: main Tagged sentence features Subject: o pronoun Object: o human o uncountable o simple o singular L1: off L2: storming R1: for 27 The senses which have been attested as co-occurring with all or some of these features are then activated in the disambiguation decision list. They included all of the 35 senses, the senses themselves co-occurring with between the minimum of 4 and the maximum of all 12 identified features 28 in the example sentence [43] above (as given in Table 4.8). 27 The marked collocates, as we remember from the description of the non-predefined features and their annotation in the original CASIS corpus, are given here as potential collocates and their status was to be decided on (based on statistical relevance) later on. 28 Leaving out ‘L2: storming’ and ‘L1: off’ which were not to be found in the original tagset as one of the 85 relevant non-predefined features after all. <?page no="87"?> 87 SENSES FEAT. TYPES IDENT. TYPES direct your gaze towards someone or something or in a specified direction 12 12 for a structure or part ofbody to have a view oroutlookin a specified direction 12 8 express or show something to someone 12 8 ignore someone or something by pretending not to notice them 12 6 inspect something briefly 12 11 peruse a book or other written material 12 9 move round in order to investigate something 12 12 think of or regard in a particular manner 12 12 examine and consider 12 12 investigate in great detail 12 12 attempt to find 12 12 evaluate someone or something with a quick glance 12 5 have the appearance of 12 12 show likelihood of 12 10 appear your usual self 12 4 rely on someone or something 12 12 expect (hope) to do something 12 4 have the appearance of being as old as you are 12 8 express a perceived air of superiority 12 8 view someone with superiority 12 10 observe someone without showing embarrassment or fear 12 3 ignore wrongdoing 12 11 make future plans 12 10 evaluate someone carefully 12 11 take care of someone or something 12 11 reminisce past events 12 8 suffer a setback 12 10 eagerly await something or someone 12 12 pay a short visit 12 5 observe an event without getting involved 12 7 quickly take notice 12 12 bring an improvement to a situation 12 4 pay a social visit while going somewhere else while going somewhere else 12 9 search for and find particular information in a piece of writing 12 10 have respect for someone 12 10 Table 4. 8 The number of identified types of features (out of the 12 possible attested ones) in the example sentence [43] ‘He tossed his horse's reins to a groom and went storming off looking for Dacourt.’ for each of the senses they co-occur with. <?page no="88"?> 88 The next suggested step is summing up the predictive powers each of these 12 attested contextual features have in respect of each of the given senses they co-occur with. For this the program refers to the information we can see in Appendices 6 and 7 listing all the predictive powers originally attested for these 12 features. The sums indicating prediction ratios are as follows in Table 4.9. After generating such a hierarchical representation of most probable disambiguation solutions for a given sentence (or rather the verb look in it), the program then offers up the three top ranking prediction ratios as the three most probable readings. Instead of selecting only the top one we have the program offer us the top three in an attempt to soften the influence of the most frequently occurring sense which tends, due to its overwhelming number of occurrence, to be forced most frequently as the first choice. And, as the Table 4.9 below suggests, the second-ranking reading of ‘attempt to find’ seems to be the correct one. Leaving the relatively successfully disambiguated example (42) aside for now, in order to evaluate the effectiveness of disambiguation for the overall results in all of the 1,392 sentences (and the application of the methodology to sense distinctness as well) there are three measures that need to be obtained from applying the given decision tree to the entire sample (Navigli 2009: 41 42): the coverage (C) which represents the percentage of senses offered within the entire test set (for all the sentences in the test set). C equals 1 in our case as the program offered a sense for each of the examples in the testing data in the CASIS Example Corpus. There were 1,280 sentences containing the verb (as opposed to 112 sentences with a noun look in them) and the program recognized them successfully and offered up a sense for each of them; the precision (P) which relates the percentage of the correctly disambiguated instances of the given word out of the senses attested in the test set. In our case P is 0.289 as we have had 371 corectly disambiguated sentences out of a sample of 1,280; and the recall or accuracy (R) which represents the ratio between the correctly identified senses and the total number of possible (or rather listed) ones. As we have had a case where coverage was 100%, R actually equals P and is 0.289 as well. <?page no="89"?> 89 R SENSES PRED. RATIOS 1. direct your gaze towards someone or something or in a specified direction 0.229 2. attempt to find 0.208 3. have the appearance of 0.057 4. think of or regard in a particular manner 0.033 5. examine and consider 0.008 6. move round in order to investigate something 0.007 7. rely on someone or something 0.006 8. investigate in great detail 0.005 9. evaluate someone carefully 0.004 10. eagerly await something or someone 0.004 11. take care of someone or something 0.002 12. ignore wrongdoing 0.002 13. view someone with superiority 0.002 14. show likelihood of 0.001 15. quickly take notice 0.001 16. inspect something briefly 0.001 17. peruse a book or other written material 0.001 18. search for and find particular information in a piece of writing 0.001 19. have respect for someone 0.001 20. have the appearance of being as old as you are 0.001 21. for a structure or part of body to have a view or outlook in a specified direction 0.0011 22. suffer a setback 0.0003 23. observe an event without getting involved 0.0003 24. make future plans 0.0003 25. reminisce past events 0.00013 26. express a perceived air of superiority 0.00013 27. pay a social visit while going somewhere else while going somewhere else 0.00009 28. evaluate someone or something with a quick glance 0.00009 29. express or show something to someone 0.00009 30. ignore someone or something by pretending not to notice them 0.00003 31. bring an improvement to a situation 0.00002 32. appear your usual self 0.00002 33. pay a short visit 0.00002 34. expect (hope) to do something 0.00001 35. observe someone without showing embarrassment or fear 0.00001 Table 4. 9 The list of ordered prediction ratios of the potential senses in the example sentence [43] ‘He tossed his horse's reins to a groom and went storming off looking for Dacourt.’ <?page no="90"?> 90 A successful approach should then have R P (they are equal, as in our case, when C=1 and the coverage is total, as it normally is in targeted WSD which focuses on one particular lexical item rather than disambiguating all of them in the given sentence/ text). The corpus-based sense distinction tested on the CASIS Example Corpus gives us the weighted mean of the relationship between the precision and accuracy (or the F-score (Artiles, Amig´o and Gonzalo 2009: 462)) of 0.289. When we compare it to the random baseline representing the result which would be obtained by randomly offering up solutions (Gale et al. 1992: 253) of 7.96e-8 we can see that it easily exceeds this lower bound. The prediction success seems apparently quite high when dealing with such a highly polysemous lexeme as the verb look. It comes up to almost 30% and is seemingly on par with such MT software as Yahoo! Babelfish, Google Translate or Microsoft Bing Translator (Guerberof Arenas 2010: 4-5). Scratching again under the surface though, we can in fact discover serious drawbacks to the presented disambiguation success. The comparison with the upper ITA bound (Duffield et al. 2007) - and a closer look at the successfully disambiguated cases is then called for. The manual inspection of the way in which the corpus-based sense disambiguation successfulness relates to the ITA bound starts by looking at the last step in the described program’s decision list. The CASIS Example Corpus suggests as accurate the senses corresponding to the top 3 ranking prediction ratios in an effort to compensate for the issues of the dominance of the most frequently occurring senses (Kilgarriff and Rosenzwieg 2000). It is the more detailed examination of the senses offered as the top 3 ranking ones that exposes the real problem of the exactitude of the methodology. Namely, out of 35 possible answers the predictive powers have for each and every one of 1,280 instances of the verb look in the example corpus only ever offered just three - always the same three most frequently occurring and the most prediction-strong senses. In all of the 1,280 sentences containing the verb, the first offered sense was ‘direct your gaze towards someone or something or in a specified direction’; the next two rankings (2 and 3) were always alternatively being taken up by only two senses - ‘have the appearance of’ and ‘attempt to find’. Hence, basically all of the successful sense disambiguation (if we observe all of the three rankings combined) was solely of these three mentioned senses. In fact, the same rank 1 sense was suggested as appropriate for every single sentence in the test set, that of ‘direct your gaze towards someone or something or in a specified direction’. This was also almost entirely caused by their high frequency of oc- <?page no="91"?> 91 currence, both in general use and within the example corpus as well, something indicated by all quantitative data we saw so far. This result tells us most, as we will discuss further on, about the implications the practical methodology brings forth. When we move away from the most frequent (and hence most prediction-strong senses), we can see that not a single less frequent (usually idiomatic) reading has been successfully identified. Such an outcome seems to strongly suggest that any upper bound reflecting human-based disambiguation would be far above the reach of this system. Using a relatively simplified analysis and actually performing all of the steps of the analysis and evaluation manually rather than employing the suggested elaborate statistical methods (such as for instance the commonly used hierarchical cluster analysis (Gries 2006: 93)) or equally elaborate learning algorithms (Ng 1997)) has allowed us to break down the corpusbased semantic methodology of sense discrimination into its most transparent constituents. Its unsuccessfulness as WSD pinpointed the exact stumbling stone of the approach when applied to defining word senses. 4.5 The theoretical and practical implications Confirming the mentioned findings of earlier similar research endeavors (see for example Ide and Veronis 1998; or Navigli 2009) we can see that using only linguistic (microcontextual) information as the source of knowledge in discriminating between readings of a polysemous lexeme (as corpus-based semantics practically employs) failed to satisfy the desired ITA standards and cannot, in terms of human-based discrimination, be considered as successful. The reason is again related to data sparseness (at various levels) coupled with data skeweness inherent to any corpus-based research. We have demonstrated that, even when drawing on large contemporary corpora, there is not enough quantitative data to provide sufficient and sufficiently detailed information about the senses of a given word and that the data we did obtain tended to be seriously skewed. We saw that out of 42 senses compiled from dictionaries (which do not even foresee the open-endedness of polysemous lexical networks) only 35 were identified in the corpus sample of 13,119 randomly selected sentences. Furthermore, out of those 35 identified senses some were found as few as only once, while others, indicating again the skewness of the data, appeared more than 5,000 times. Even if this sort of data skewness could be compensated for by em- <?page no="92"?> 92 ploying elaborate statistical methods or indefinitely extending the corpus sample or manually pre-selecting the sentences, data sparseness would be a problem for the identification and semantic value of the contextual features as well. By default, the less frequently occurring senses offered just a small chance to attest a sufficient number of co-relating contextual features and avoid chance co-occurrences. This problem had a major effect on the predictive powers (and ultimately prediction ratios) of the attestable and attested features, since the frequency of occurrences is one of the two dimensions that construe them. The other dimension of the predictive powers, the distinctiveness of the contextual features, was also affected by the problem of data sparseness. The number of truly distinctive contextual features (the ones that appeared with only one particular sense or indeed very few senses) was also very low. The question that arises at this point is whether data sparseness could be addressed perhaps by increasing the size of the sample further in an attempt to increase the amount of quantifiable data or perhaps by extending the ontology of the contextual features in order to define the contextual patterns and the distribution of readings they condition in more depth. The first reason why neither of these solutions would bear fruit is that it would be very demanding to manually or even semi-automatically process the extremely large numbers of sentences (and here we are talking millions since thousands did not suffice). This difficulty of processing data would be even more striking if we were to include more complex ontologies, such as theta roles or argument structures, for example. This difficulty of processing relates also to manually scouring endless corpora for preselecting the corpus samples. The second and more important reason why tampering with the corpus sample in any manner would not work is that the results presented above seem to indicate that the quantitative discrepancy between the very frequently occurring senses and the least frequently occurring ones would again considerably influence both the distinctiveness and the frequency of occurrences and co-occurrences of the contextual features. The original problem of data sparseness when identifying senses alone is sufficient to derail the methodology regardless of the size or the composition of the corpus. The more frequent a sense is, the more cooccurring features it can be identified with and the more diverse those features will be - the increased contextual interplay being true both for corpus samples and for language use in general. It is hence unlikely that a sufficient number of prediction-powerful contextual features (i.e. distinctive and at the same time frequent enough) can be found within any possi- <?page no="93"?> 93 ble contextual environment regardless of the complexity of the ontology we may choose or the size of the corpus we may opt for. More frequently occurring senses will by default interact with more contextual patterns, rendering the patterns themselves prediction powerless. Less frequently occurring senses will not proffer enough in terms of quantitative contextual linguistic data and the ones we do obtain would be tainted by the influence of the data gathered for the more frequently occurring senses. The result of any such sense disambiguation approach based entirely on microcontextual linguistic data would most likely mirror what we have seen in the experiment presented above - the predictive powers would be higher for the more frequent senses and would decline exponentially (nearing zero) for the less frequent ones, all due to multifaceted data sparseness and data skeweness. These practical implications actually relate back the main theoretical backbone of the methodology: that you can know a word by the company it keeps (Firth 1957: 16). The theoretical conclusion that seems unavoidable is that when it comes to semantic meaning and the manner in which the multiple meanings interplay within a polysemous lexical item, microcontextual conditioning is simply not strong enough. Hence, you cannot know a word by its immediate company. The problem of data sparseness in this respect, however, does not refer to the impossibility of gathering enough data since the corpora available these days number hundreds of millions of words and the manner of their processing is constantly being improved. Instead, data sparseness should be understood as the impossibility of finding enough meaningful prediction-strong data, enough microcontextual, solely linguistic, features which would be distinctive enough so they exclusively condition a particular reading to that extent that its conditioning is now distinctive enough to separate it from other related finely grained senses of the same polysemous item. Most contextual features in reality cooccur very widely - the more frequent the sense the greater the need imposed by its use to occur in more diverse contexts. It follows that using methods such as corpus-based semantic analysis as the basis of defining sense distinctness can give us only so much rope to go down the rabbit’s hole that is semantic meaning. Meaning is so context dependent (including here various kinds of contexts starting with the micro one all the way up to what we may term general knowledge) that turning it solely into a list of coded rules using linguistic data alone and presenting it as a matter of numbers does not seem feasible. Hence, it is important to be aware of the limitations of this and similar methods and of <?page no="94"?> 94 the limitations of the application of statistics in lexical semantics. The methodology should ultimately be understood as perhaps only one in an arsenal of tools which may aid us in lexicology, the practical usefulness of which, when we have in mind the amount of (manual) input these methods usually require, is still to be ascertained by its more extensive use in distribution-based lexicography. <?page no="95"?> 95 5. A corpus-based account of prototypicality The prototype-based conception of the categorization of knowledge (and subsequently of meaning) quite famously originated in the mid-1970s with Eleanor Rosch’s (1975) psycholinguistic research. From her study into the notions of colors the field moved in two directions: on the one hand her ideas were taken up by formal psycholexicology (information-processing psychology) which tries to devise formal models for human conceptual memory and its operation; on the other hand, there was the ever-growing linguistic prototype theory, starting in the mid-1980s from within Cognitive Linguistics. Following this development within Cognitive Linguistics (or rather Cognitive Semantics), there are four characteristics which are frequently mentioned as distinctive for category organization and prototypicality (Geeraerts 2010: 184-192): all categories exhibit degrees of prototypicality as not every member is equally representative of a given category; all categories exhibit family resemblance of their members; third, all categories are blurred (fuzzy) at the edges; lastly, categories cannot be defined by means of a single set of necessary and sufficient criteria. These characteristics do not always co-occur and can be exhibited in various combinations and to various degrees by individual lexical items (Dobri 2011). When applied to semantics, the clustering of meaning that is typical of a family resemblance structure is quite evident and implies unequal structural importance (Evans and Green 2006: 328-355). We can hence clearly recognize the possibility of applying the prototype theory to the description of polysemy (Geeraerts 2010: 192). In fact, prototypicality and category membership seem to be such central notions of human cognition that the expectation that they had to be found within the realm of linguistic representations as well comes as a given (Tsohatzidis 1990). The idea implied is that linguistic categories should also be observed as fuzzy at the edges but clear in the center (Grkovi -Major 2008: 53). What follows is that polysemous meanings should not be seen in isolation, but rather as related to a central sense and to one another by means of meaning extensions. Family resemblance effects make any polysemous lexeme seem like a cluster of mutually interrelated readings stemming from a central core reading (Evans and Green 2006: 344-348). The central reading is the one seen as encompassing a maximum number of structurally salient features (such as the contextually co-occurring ones for example). The mainstream representational model for such a prototype-based polysemous structure is the al- <?page no="96"?> 96 ready mentioned radial network model (Brugman 1988). As we recall, within this model of polysemy the senses were seen as related to the prototype and to one another by means of individual semantic links (Geeraerts 2010: 192). These links make up a lexical network of the meaning of a given lexeme and are marked by the kind of semantic relation that underlies them. The disadvantage of such an early model was that the meanings were represented as relatively isolated entities since their complex and subtle relations are not made explicit enough. Category members could perhaps be seen as more naturally defined by whether or not they are semantically related to the best exemplar of that category - the prototype. The first question that asserts itself here is what makes one sense the prototypical one? 5.1 Prototypicality of senses There are different features put forth as characteristic of these most representative readings of polysemous lexemes (Gilquin 2006: 160): they are acquired by children before any other members of a given semantic category; prototypical senses are usually the ones that first come to mind and are produced first in psychological priming experiments; they are easiest to recognize and are hence perceptually most salient; since prototypes are products of earliest acquisition they are also easiest to memorize (and hence fastest and easiest to elicit); and they are the earliest attested senses in etymological terms. The problem in attesting these characteristics lies in finding tangible evidence of prototypicality of senses in the linguistic behavior of any given lexeme and expressing it in solid criteria. The search was on then for criteria which would pinpoint the central readings of polysemous lexemes and which would define the semantic and hierarchical relationships between it and other members of the polysemous spectrum. Such information on the links between the senses within a polysemous lexeme would complement or even eliminate the need for using the listed extra-linguistic criteria (based mostly on elicitation experiments) to mark prototypicality of readings. The first such empirical investigations into prototypicality of linguistic categories are usually tied to contemporary corpus analyses. The most convincing proof of the prototypicality of linguistic categories came pre- <?page no="97"?> 97 cisely from corpus data because it was first to point unequivocally to their non-discreteness and their fuzziness (Teubert 1996: v). It appears that the hazy nature of frequency data perfectly reflects the idea of category membership and prototypicality - the manner in which linguistic representations are organized seems to form a continuum with central and peripheral members within it reflected in their frequency of occurrences. The notion was further reinforced through various attempts at connecting corpus methodology and linguistic theory. One such link came from the realization that the attested frequency in language is to be seen as responsible for entrenchment of the given frequent structures in the cognitive system - the more frequent a given linguistic structure is, the more salient it becomes and hence is stored in the mind as more prototypical as well (Gilquin 2006: 166). Following this very intuitive line of argumentation, the identification of the prototypical member of any linguistic category is a simple matter of ascertaining the most frequently occurring one (see for example Geeraerts (1988); or Aitchison (1998)), all provided to us by our corpusbased semantic analysis. One drawback of such a theoretical account is that the frequency data can only sufficiently point to the prototype alone, as we will see in our analysis further on. We cannot learn much about the structure of the category membership or explain the nature of their relationship both with the prototype and with each other. Hence, expanded insight into the link between linguistic information and prototypicality building on the essential link between frequency and prototypicality using other available (corpus) data seems necessary. 5.2 Quantitative analysis Acting on the premise that linguistic information such as the frequency of occurrence or contextual salience can and does condition and reveal the prototypicality of senses, the following sections will give an account of how corpus-based semantics can contribute to quantifying the fuzzy and scalar corpus data and provide a more criteria-based and a more detailed outline of membership in linguistic (or rather semantic) categories. As mentioned above, a prototype of a given linguistic category should correspond to the central tendency of that category and is seen as its ideal member. Its core status seems to be derived from two linguistic factors: the highest frequency of use and the widest contextual appearance. Now, un- <?page no="98"?> 98 like elicitation experiments which rely on native-speaker intuition for identifying prototypes, a (corpus) linguistic model relies foremost on quantitative data, which goes quite in hand with the account of the prototypicality of linguistic (semantic) categories we presented earlier. Its successfulness and applicability, however, depend on the choice of linguistic (contextual) criteria for marking prototypicality. Following that line of reasoning, since a prototype is supposed to be the best exemplar of its category, a corpus investigation should identify the given prototypical reading as the most frequently occurring one and semantically and contextually most strongly linked to all of the other members of the same category. In other words, the prototypical reading should display the highest degree of inter-category similarity. An important benefit of focusing on inter-category similarity as a marker of prototypicality of senses is that the differences between the extents of inter-category similarity of other senses among themselves can be seen in turn as corresponding to the distances between all members of a category. To recapitulate, we will be defining as prototypical a sense which occurs in language use (and consequently in corpus data) more frequently than all other related senses within one polysemy spectrum, which appears in more diverse contextual patterns, and which shares more contextual distribution with all of the related senses than any other reading of the same lexeme. After identifying the prototypical reading following these criteria, the same information will serve us as the basis of attesting the degrees of category membership of all other related readings. The more frequently occurring, contextually diverse, and contextually inter-category similar a sense is, the closer to the core it will be. The corpus-based semantic analysis of the verb look we performed through steps 1-3 gives us the necessary linguistic information: the weighted strengths of semantic conditioning of each of the building-blocks of the surrounding linguistic context (predictive powers); the joint strength of semantic influence for each given context (prediction ratio); the width of interplay of one sense with other senses in terms of shared linguistic context (to be seen later as features ratio); the strengths of the links between different contextual features attested with different readings (further on termed overlap ratio); and finally the weighted influence of the distributional properties of the readings on the levels of the said readings’ prototypicality (presented subsequently as prototypicality ratio). Having thus defined our corpus-based semantic parameters representing the said linguistic information and having established the set of linguistic information we will be using in accounting for prototypi- <?page no="99"?> 99 cality of the readings of the verb look, the most sensible way to start the discussion is to try and identify the prototype meaning first. We have suggested that in purely linguistic terms two sets of data point to a possible best exemplar sense. One is relatively simple and it refers to the very frequency of the appearance of the given sense in comparison to the frequencies of all other senses of the given polysemous lexeme. It must be said that such a claim is seemingly backed by such mentioned criteria as ease of activation, earliest acquisition, and other mentioned prototype markers which can all in essence be linked to the frequency of use and the linguistic and cognitive sedimentation it causes. The other set of data is a bit more difficult to attest since it depends upon identifying the degree as to which the given prototype candidate encompasses other senses in terms of its occurrence and interplays with them in terms of the contextual features they share (basically the total degree of its inter-category similarity). When it comes to the first criterion of comparative frequency, looking once more at Table 5.1 below (which is the same as Table 4.1 only repeated here for easier access with the addition of the code system), we can see that the sense ‘direct your gaze towards someone or something or in a specified direction’ towers above all other senses in terms of its occurrence. It seems undisputable that, according to this first, most basic, and most intuitive criterion the given sense can be postulated as being the prototypical one. However, in the discussion above we have mentioned that apart from this rather convincing measure we also need to look at other linguistic features in order to confirm the proposed prototypical status. We need to confirm if this criterion can be further supported (or disproved) by calling on the information relating to its inter-category similarity. This second criterion is comprised of two complementing values. One value gives us the type and token frequency of the contextual features identified with each of the senses, the focus being on the highest score which should indicate the highest degree of contextual diversity. The second value gives us the number of the senses with which each of the individual senses overlap in terms of shared contextual features and the extent to which they do so. If we look at Table 5.2, we can see the first set of values - the ratio between the feature types and tokens identified per sense (the feature ratio, obtained by dividing the number of feature tokens identified with one sense with the number of the identified feature types). It gives us crucial information on how widely one sense appears in terms of different linguistic contexts (the mentioned contextual diversity of a given sense). <?page no="100"?> 100 R C SENSE FREQ. 1. 1 direct your gaze towards someone or something or in a specified direction 5,742 2. 13 have the appearance of 3,045 3. 11 attempt to find 1,860 4. 8 think of or regard in a particular manner 894 5. 9 examine and consider 240 6. 7 move round in order to investigate something 217 7. 10 investigate in great detail 173 8. 14 show likelihood of 165 9. 28 eagerly await something or someone 129 10. 16 rely on someone or something 124 11. 25 take care of someone or something 110 12. 24 evaluate someone carefully 105 13. 20 view someone with superiority 38 14. 6 peruse a book or other written material 37 15. 22 ignore wrongdoing 37 16. 5 inspect something briefly 34 17. 31 quickly take notice 29 18. 34 search for and find particular information in a piece of writing 27 19. 2 for a structure [...] to have a view or outlook in a specified direction 25 20. 30 observe an event without getting involved 20 21. 18 have the appearance of being as old as you are 18 22. 35 have respect for someone 16 23. 27 suffer a setback 11 24. 23 make future plans 5 25. 26 reminisce past events 4 26. 15 appear your usual self 2 27. 19 express a perceived air of superiority 2 28. 32 bring an improvement to a situation 2 29. 33 pay a social visit while going somewhere else 2 30. 3 express or show something to someone 1 31. 4 ignore someone or something by pretending not to notice them 1 32. 12 evaluate someone or something with a quick glance 1 33. 17 expect (hope) to do something 1 34. 21 observe someone without showing embarrassment or fear 1 35. 29 pay a short visit 1 Table 5.1 Raw frequencies and ranks of the 35 attested senses. 29 29 The sense codes 1-35 (given also in Appendix 1) are important because of their appearance in the discussion and in the sense network outline further on where they were used instead of the full sense paraphrases due to limitations of space. <?page no="101"?> 101 R SENSE TYPES TOKENS FEAT. RATIO (Tokens/ Types) 1. direct your gaze towards someone or something or in a specified direction 176 119,314 677.92 2. have the appearance of 181 51,158 282.64 3. attempt to find 154 38,266 248.48 4. think of or regard in a particular manner 144 19,264 133.78 5. examine and consider 129 5,545 42.98 6. move round in order to investigate something 98 3,743 38.19 7. investigate in great detail 114 3,235 28.38 8. rely on someone or something 98 2,604 26.57 9. evaluate someone carefully 80 1,936 24.20 10. eagerly await something or someone 107 2,504 23.40 11. show likelihood of 103 2,341 22.73 12. take care of someone or something 104 1,727 16.61 13. view someone with superiority 72 846 11.75 14. peruse a book or other written material 68 784 11.53 15. inspect something briefly 60 648 10.80 16. ignore wrongdoing 65 645 9.92 17. search for and find particular information in a piece of writing 62 548 8.84 18. quickly take notice 57 490 8.60 19. have respect for someone 47 374 7.96 20. observe an event without getting involved 42 302 7.19 21. for a structure [...] to have a view or outlook in a specified direction 83 535 6.45 22. have the appearance of being as old as you are 51 266 5.22 23. suffer a setback 47 187 3.98 24. make future plans 30 101 3.37 25. reminisce past events 23 48 2.09 26. appear your usual self 18 29 1.61 27. bring an improvement to a situation 19 28 1.47 28. express a perceived air of superiority 23 33 1.43 29. pay a social visit while going somewhere else 24 31 1.29 30. express or show something to someone 23 23 1 31. ignore someone or something by pretending not to notice them 18 18 1 32. evaluate someone or something with a quick glance 23 23 1 33. expect (hope) to do something 19 19 1 34. observe someone without showing embarrassment or fear 12 12 1 35. pay a short visit 22 22 1 Table 5.2 Ranked frequencies of the features per sense with feature ratios. <?page no="102"?> 102 The information in Table 5.2 above clearly confirms the prototypical status of the ‘direct your gaze towards someone or something or in a specified direction’ sense. Its 176 identified types of features (including the predefined and non-predefined ones) appeared 119,314 times. This combination results in the highest feature ratio and makes this sense the contextually most diverse. What we need now is the second dimension of this second criterion which is, if we recall, the (highest) degree of inter-category similarity. This criterion deals with the number of different senses one given sense (in this case the candidate for the prototype ‘direct your gaze towards someone or something or in a specified direction’ one) shares features with and the numerical value (strength) of these overlapping points. The matrix in Table 5.3 below gives us that information. As Table 5.3 may appear rather complex at first glance, a brief explanation of it may seem necessary enough before we can move on to a discussion of what it tells us. Both rows and columns in the table represent senses, given here in code (senses 1 to 35) which corresponds to their original listing given in Table 5.1 (and in Appendix 1). The intersecting cells in the table represent points of value of the contextual features overlapping with the individual senses, while the empty shaded fields mark no overlap. The black cells do not represent anything but only mark points in which one sense intersects with itself in the matrix and they have neither value nor significance. The most important part of the table, however, is the section where the sums are given. These cells actually represent the result of the whole exercise and are thus the point from which we can continue our discussion. For easier access the sum cells from Table 5.3 have been copied into Table 5.4 further on (into columns marked as Overlaps and Features shared). By dividing their values we obtain a ratio of how many individual senses share the same contextual features as the given sense and how many features in total are found as being shared among those intersecting senses. The last column in the table basically provides information on which sense has the highest degree of inter-category similarity (termed here overlap ratio). And here we get our first surprise, if not a jaw-dropping one - as we can see, we have as the most diverse sense ‘attempt to find’ interplaying with 32 different senses (out of 34 remaining ones) and doing so in 831 features shared (giving it an overlap ratio of 25.97). This information at this point seemingly dethrones the ‘direct your gaze towards someone or something or in a specified direction’ sense as the prototype supreme. <?page no="103"?> 103 Table 5.3 Overlaps of the contextual features amongst all of the senses. <?page no="104"?> 104 C R SENSES OVERL. FEAT. SHARED OVERLAP RATIO 11 1. attempt to find 32 831 25.97 1 2. direct your gaze towards someone or something or in a specified direction 32 765 23.91 8 3. think of or regard in a particular manner 30 708 23.60 9 4. examine and consider 31 713 23 13 5. have the appearance of 32 688 21.50 14 6. show likelihood of 31 644 20.77 2 7. for a structure … to have a view or outlook in a specified direction 31 640 20.65 10 8. investigate in great detail 29 572 19.72 28 9. eagerly await something or someone 29 542 18.69 7 10. move round in order to investigate something 31 566 18.26 16 11. rely on someone or something 30 498 16.60 25 12. take care of someone or something 30 496 16.53 20 13. view someone with superiority 28 326 11.64 6 14. peruse a book or other written material 32 323 10.09 24 15. evaluate someone carefully 29 275 9.48 22 16. ignore wrongdoing 29 254 8.76 18 17. have the appearance of being as old as you are 28 229 8.18 5 18. inspect something briefly 29 228 7.86 34 19. search forand find particularinformationin a piece ofwriting 30 224 7.47 31 20. quickly take notice 30 195 6.50 35 21. have respect for someone 30 195 6.50 12 22. evaluate someone or something with a quick glance 16 96 6 27 23. suffer a setback 21 98 4.67 30 24. observe an event without getting involved 30 125 4.17 3 25. express or show something to someone 4 12 3 21 26. observe someone without showing embarrassment or fear 1 3 3 29 27. pay a short visit 27 68 2.52 15 28. appear your usual self 26 54 2.08 23 29. make future plans 29 58 2 19 30. express a perceived air of superiority 5 9 1.80 32 31. bring an improvement to a situation 11 19 1.73 26 32. reminisce past events 23 36 1.57 17 33. expect (hope) to do something 28 42 1.50 33 34. pay a social visit while going somewhere else 27 28 1.04 4 35. ignore someone or something by pretending notto notice them 22 21 0.95 Table 5.4 Overlap instances and number of overlapped features - the overlap ratios. <?page no="105"?> 105 However, if we look at the next ranking sense we see that it is exactly ‘direct your gaze towards someone or something or in a specified direction’ linked also to 32 out of 34 other different senses. It falls behind the first ranking one in the number of features shared with the linked senses with 765 feature tokens overlapped (which is only 66 less than ‘attempt to find’ sense), standing at 23.91 overlap ratio. If we combine this second-ranking in the inter-category similarity dimension, given here in the form of an overlap ratio, with the previously described criteria of having the highest feature ratio (signifying the greatest width of the possible contexts in which it can appear) and with it being the most frequent of all senses identified within the corpus sample, the sense ‘direct your gaze towards someone or something or in a specified direction’ seems to be confirmed as the best candidate for the prototype of the category of the verb look. We may also have in mind, if we were to attempt such a combination of the criteria for the highest overlap ratio sense ‘attempt to find’, that it holds the secondranking feature ratio which is significantly smaller than that of ‘direct your gaze towards someone or something or in a specified direction’ sense (282.64 to 677.92) and that it also arrives only third in the overall frequency ranking (at 3,882 counts less than the forerunner ‘direct your gaze towards someone or something or in a specified direction’). Hence, the ‘attempt to find’ sense could not be considered for the place of the most prototypical sense according to two of the three given criteria. We are then left to once again reaffirm that the contextual linguistic data seems to point, based on previously defined criteria (of highest comparative frequency of occurrence, highest contextual diversity, and highest inter-category similarity), to the ‘direct your gaze towards someone or something or in a specified direction’ as the prototypical reading of the verb look. 5.3 Evaluation of the results for prototypicality The prototypicality of the ‘direct your gaze towards someone or something or in a specified direction’ sense which has been asserted using the proposed corpus-based semantic procedure can be further confirmed (or disproved) by applying a more in-depth qualitative analysis. Such an inquiry would entail going back and looking at other possible and already mentioned avenues of identifying a prototype reading: prototypes are acquired by children before any other members of a given category; <?page no="106"?> 106 prototypes are usually the ones that first come to mind and are produced first in psychological priming experiments; prototypes are easiest to recognize and are hence perceptually most salient; prototypes are also easiest to memorize (and hence fastest and easiest to elicit), seeing that they are products of earliest acquisition; prototype readings can be recognized (as semantically present) in all of the related family members; and prototypes are the earliest attested senses in etymological terms. To avoid using elicitation experiments, our evaluation assumed (following for example Geerearts (1988); Aitchison (1998); or Gilquin (2006)) that the salience of the core sense (and hence its prototypicality) comes primarily from its highest frequency of use and a higher level of entrenchment. If we go through the parameters listed immediately above, we can postulate that the prototype reading has been acquired earliest due to the fact that children were exposed earliest to it because of its highest frequency of use. Because of this early entrenchment, the prototypes are easiest to memorize and are first to be activated, recognized, and elicited. Two avenues seemingly unrelated to frequency of occurrence are then left for us to investigate - a descriptive overview of the results and the information on the etymological development of the senses of the verb look. The first avenue involves looking at the whole set of senses in an introspective fashion similar to the attempt given in Chapter 4 and repeated here in Figure 5.1. This descriptive outline (which we do not really have to dissect once more here) clearly confirms our candidate for the prototype. It can be complemented by the fact that the ‘direct your gaze towards someone or something or in a specified direction’ sense is indeed etymologically the earliest one recorded (year 1000 A.D., some 300 years prior to any other recorded meaning of the verb in English according to the OED). Bearing in mind the different avenues we pursued, it becomes quite safe to claim this reading is indeed the center of the category of the senses of the verb look. <?page no="107"?> 107 Figure 5.1 Qualitative account of the links between the senses of the verb look. direct your gaze towards someone or something or in a specified direction for a structure or part of body to have a view or outlook in a specified direction observe someone without showing embarrassment or fear rely on someone or something eagerly await something or someone pay a short visit observe an event without being involved ignore someone or something by pretending not to notice them suffer a setback bring an improvement to a situation take care of someone or something expect (hope) to do something quickly take notice reminisce past events make future plans express a perceived air of superiority view someone with superiority ignore wrongdoing think or regard in a particular manner express or show something to someone move round in order to investigate something investigate in great detail evaluate someone or something with a quick glance examine and consider inspect something briefly evaluate someone carefully peruse a book or other written material examine and consider search for and find particular information in a piece of pay a social visit while going somewhere else have the appearance of show likelihood of appear your usual self have the appearance of being as old as you are have respect for someone <?page no="108"?> 108 R SENSES FREQ. FEAT. RATIO OVERLAP RATIO PROT. RATIO 1. direct your gaze towards someone or something or in a specified direction 5,742 677.92 23.91 6,443.83 2. have the appearance of 3,045 282.64 21.50 3,349.14 3. attempt to find 1,860 248.48 25.97 2,134.45 4. think of or regard in a particular manner 894 133.78 23.60 1,051.38 5. examine and consider 240 42.98 23 305.98 6. move round in order to investigate something 217 38.19 18.26 273.45 7. investigate in great detail 173 28.38 19.72 221.10 8. show likelihood of 165 22.73 20.77 208.50 9. eagerly await something or someone 129 23.40 18.69 171.09 10. rely on someone or something 124 26.57 16.60 167.17 11. take care of someone or something 110 16.61 16.53 143.14 12. evaluate someone carefully 105 24.20 9.48 138.68 13. view someone with superiority 38 11.75 11.64 61.39 14. peruse a book or other written material 37 11.53 10.09 58.62 15. ignore wrongdoing 37 9.92 8.76 55.68 16. inspect something briefly 34 10.80 7.86 52.66 17. for a structure [...] to have a view or outlook in a specified direction 25 6.45 20.65 52.09 18. quickly take notice 29 8.60 6.50 44.10 19. search for and find particular information in a piece of writing 27 8.84 7.47 43.31 20. have the appearance of being as old as you are 18 5.22 8.18 31.39 21. observe an event without getting involved 20 7.19 4.17 31.36 22. have respect for someone 16 7.96 6.50 30.46 23. suffer a setback 11 3.98 4.67 19.65 24. make future plans 5 3.37 2 10.37 25. evaluate someone or something with a quick glance 1 1 6 8 26. reminisce past events 4 2.09 1.57 7.65 27. appear your usual self 2 1.61 2.08 5.69 28. express a perceived air of superiority 2 1.43 1.80 5.23 29. bring an improvement to a situation 2 1.47 1.73 5.20 30. express or show something to someone 1 1 3 5 31. observe someone without showing embarrassment or fear 1 1 3 5 32. pay a short visit 1 1 2.52 4.52 33. pay a social visit while going somewhere else 2 1.29 1.04 4.33 34. expect (hope) to do something 1 1 1.50 3.50 35. ignore someone or something by pretending not to notice them 1 1 0.95 2.95 Table 5.5 Ranking of senses according to their prototypicality - prototypicality ratios. <?page no="109"?> 109 The question that comes up next is: which is the second, third or fourth most prototypical sense and how to attest that quantitatively? For an answer we can revert to Table 5.5 above which gives us the combined values of the frequency of the senses, their feature ratios, and their overlap ratios (all simply added together). We termed this combination of influences the prototypicality ratio. The distances of the senses from the prototypical sense and from each other, in terms of their own levels of prototypicality, should correspond to the distances between the given prototypicality ratios. The ranking given in Table 5.5 should then correspond to the actual levels of category membership of all of the senses of the verb look. Following from the evaluation of the prototype status of the ‘direct your gaze towards someone or something or in a specified direction’ sense, this claim should also be put to the test. If we recall, the prototype was evaluated through two procedures: a descriptive qualitative overview of the levels of prototypicality and information on the historical origins of the senses. Due to the lack of native speaker elicitation data or language acquisition information on the verb look and due to the insistence of corpus-based semantics to keep the description at the linguistic level we will assume, as we did in the case of the prototype, that the declining levels of frequency and other parameters other 34 senses exhibit also signify the corresponding declining levels of entrenchment, the declining stages of their acquisition (children were exposed to the senses in frequency equivalent to their respective frequency of use), and the declining levels of activation. The qualitative evaluation, similar to the one presented for the prototypical sense, will stem from the description of senses given previously in Chapter 4 and in both Figures 4.5 and 5.1. Hence, if we look below, Table 5.6 represents the hierarchy of the senses according to the stated qualitative descriptive analysis alone (the rank representing ordered levels of prototypicality). Basically Table 5.6 was built on up the deliberations on observable semantic connections subjectively interpreted while looking at their relationships with the core reading and amongst each other. In such a fashion we can again start with a proposition that the physical act of ‘direct[ing] your gaze towards someone or something or in a specified direction’ is to be seen as the core one and as such can be understood as the basis for sense extension into all of the other 34 identified senses. These sense extensions are to serve as our original basis for judging prototypicality: if we can observe that one or more senses extend semantically out of the other, one we can say that they are less salient and hence less prototypical. <?page no="110"?> 110 R SENSES 1. direct your gaze towards someone or something or in a specified direction 2. have the appearance of think or regard in a particular manner 3. examine and consider 4. attempt to find 5. move round in order to investigate something investigate in great detail show likelihood of eagerly await something or someone rely on someone or something reminisce past events take care of someone or something observe someone without showing embarrassment or fear peruse a book or other written material make future plans for a structure [...] to have a view [...] in a specified direction inspect something briefly quickly take notice search for and find particular information in a piece of writing have the appearance of being as old as you are observe an event without getting involved have respect for someone view someone with superiority evaluate someone carefully evaluate someone or something with a quick glance suffer a setback appear your usual self bring an improvement to a situation express a perceived air of superiority express or show something to someone ignore wrongdoing pay a social visit while going somewhere else pay a short visit expect (hope) to do something ignore someone or something by not to notice see them Table 5.6 Coarsely-grained hierarchy of senses based on our qualitative analysis. <?page no="111"?> 111 One more marker of saliency was whether the given sense extends further into more senses and whether those extensions branch off further. The more branching a sense is, the more salient we can suppose it is going to be. That is why we have the division of prototypicality based on our qualitative deliberations so coarsely grained. Once the criteria of sense extension has been exploited, not much more of any kind of criteria were left to more finely grain the prototypicality levels of the non-branching senses. Hence, we were forced to group them all into four large clusters of category membership: directly below the prototype reading we have the two senses ‘have the appearance of’ and ‘think or regard in a particular manner’ as branching nodes of the first order, seeing that they extend into other senses and have no apparent overreaching senses except the prototypical one. Then at level three of prototypicality we have the branching node of the second branching order ‘examine and consider’ sense. It spreads out from ‘think or regard in a particular manner’, further extending into the branching node of the third order ‘attempt to find’, which in turn extends into some of other non-branching senses. All of the non-branching senses are then one big cluster of senses already extending from different more prototypical senses. They cannot be subdivided further in terms of their prototypicality based on introspection alone, which is quite an unsatisfactory result and does not tell us much comparison-wise. Following our reasoning of qualitatively investigating the levels of prototypicality in this evaluation step, we can say that, in order to consider the prototypicality ranking given in Table 5.5 as probable, it should correspond (at least for the most part) to the deliberations presented in Figure 5.1 and Table 5.6. We can see that there are not too many discrepancies, at least when it comes to the more prototypical senses which are divided finely enough. Apart from seeing the correspondence between the offered prototype reading (which has been confirmed as probable in this manner previously), we can see that the suggested level two of prototypicality also shows similarities based on both quantitative and qualitative criteria. The sense ‘have the appearance of’ holds the second place in the prototypicality hierarchy according to both criteria. The ‘think or regard in a particular manner’ sense does come slightly lower in the corpus-based hierarchy (as fourth in prototypicality), but it is relatively close to the shared second place arrived at through introspection. The ‘examine and consider’ sense is also two places lower in the corpus-based prototypicality ranking, but again only by two degrees. The sense ‘attempt to find’ is on the other hand only one degree higher up in the qualitative hierarchy than as given by our <?page no="112"?> 112 quantitative outline. As for the last large cluster of prototypicality given in Table 5.6 (senses of rank 5), all we can do is confirm that they all do come after the previously described five senses in both suggested hierarchies. We are prevented from a more meaningful comparison by the fact that the qualitative analysis did not provide us with enough information to have a more finely-grained division within this last cluster. It is quite impossible to state, looking at the readings introspectively, whether the ‘investigate in great detail’ sense, for example, is of a higher prototypical order than the related sense ‘peruse a book of other related material’. There is no indication of the possible paths and order of semantic extension despite the obvious evidence of semantic relatedness. Hence, at this point we can only turn to the evaluation on the basis of etymological origins for perhaps a more complete result. Table 5.7 provides us with information on the first recorded use of each of the senses (according to OED). The idea behind this criterion is that the levels of prototypicality given in both Tables 5.5 and 5.6 will match the historical development from one sense to the other. If we compare the rankings given in Table 5.5, Table 5.6, and finally the ones in Table 5.7 we should expect a close match if not a perfect one. Given the unreliability of etymological data, both from a source and a semantic point of view, we can expect a certain amount of noise in the etymological data. To compensate for such noise, and for any other noise inherent even in most solid linguistics (corpus) data, we will observe the correspondences in terms of a degree of freedom extending to three hierarchical levels of difference. Therefore, if the suggested levels of prototypicality differ in up to three degrees according to various criteria we employ, they will still be considered a match. Table 5.8 sums up our extensive comparison. There are three possible outcomes: the quantitatively, qualitatively, and etymologically suggested ratings could all three be found to match (indicated in the table as 3/ 3); two of the suggested degrees could be seen as corresponding (given as 2/ 3); and no correspondence might be found (0/ 3). To reiterate, a window of up to three prototypicality degrees of separation was observed as an indicator of correspondence in order to account for the said potential noise in the etymological and/ or quantitative data. As we can see, there is already one complete perfect correspondence we established earlier - the core reading itself. Apart from that, there are in total 17 correspondences between two out of the three suggested prototypicality hierarchies - 1 close correspondence of all three criteria; 6 correspondences between the quantitative and qualitative analyses (already reported on); 6 correspondences between the <?page no="113"?> 113 quantitatively and etymologically suggested prototypicality levels; and 4 correspondences between the qualitatively obtained degrees and the etymologically suggested ones. R SENSE 1 st record 1. directyourgazetowardssomeoneorsomethingorina specifieddirection 1000 2. examine and consider 1300 3. evaluate someone or something with a quick glance 1300 4. observe someone without showing embarrassment or fear 1375 5. reminisce past events 1375 6. expect (hope) to do something 1377 7. quickly take notice 1390 8. attempt to find 1394 9. have the appearance of 1440 10. think of or regard in a particular manner 1548 11. have respect for someone 1548 12. investigate in great detail 1586 13. inspect something briefly 1590 14. show likelihood of 1593 15. suffer a setback 1599 16. pay a short visit 1604 17. rely on someone or something 1611 18. eagerly await something or someone 1626 19. fora structureorpartofbodytohaveavieworoutlookina specifieddirection 1656 20. ignore wrongdoing 1666 21. search for and find particular information in a piece of writing 1692 22. view someone with superiority 1711 23. express or show something to someone 1727 24. move round in order to investigate something 1754 25. bring an improvement to a situation 1806 26. appear your usual self 1828 27. have the appearance of being as old as you are 1828 28. pay a social visit while going somewhere else while going somewhere else 1852 29. observe an event without getting involved 1872 30. peruse a book or other written material 1887 31. evaluate someone carefully 1892 32. take care of someone or something 1892 33. express a perceived air of superiority 1940 34. ignore someone or something by pretending not to notice them 1959 35. make future plans 1963 Table 5.7 The order of etymological origins of the 35 identified senses of the verb look according to the OED. <?page no="114"?> 114 SENSES QUANT. QUAL. ETYM. CORR. direct your gaze towards someone or something or in a specified direction 1. 1. 1. 3/ 3 have the appearance of 2. 2. 9. 2/ 3 attempt to find 3. 4. 8. 2/ 3 think of or regard in a particular way 4. 2. 10. 2/ 3 examine and consider 5. 3. 2. 3/ 3 move round in order to investigate something 6. 5. 24. 2/ 3 investigate in great detail 7. 5. 12. 2/ 3 show likelihood of 8. 5. 14. 2/ 3 eagerly await something or someone 9. 5. 18. 0/ 3 rely on someone or something 10. 5. 17. 0/ 3 reminisce past events 11. 5. 5. 2/ 3 take care of someone or something 12. 5. 32. 0/ 3 observe someone without showing embarrassment or fear 13. 5. 4. 2/ 3 peruse a book or other written material 14. 5. 30. 0/ 3 make future plans 15. 5. 35. 0/ 3 for a structure [..] to have a view or outlook in a specified direction 16. 5. 19. 2/ 3 inspect something briefly 17. 5. 13. 0/ 3 quickly take notice 18. 5. 7. 2/ 3 search for and find particular information in a piece of writing 19. 5. 21. 2/ 3 have the appearance of being as old as you are 20. 5. 27. 0/ 3 observe an event without getting involved 21. 5. 29. 0/ 3 have respect for someone 22. 5. 11. 0/ 3 view someone with superiority 23. 5. 22. 2/ 3 evaluate someone carefully 24. 5. 31. 0/ 3 evaluate someone or something with a quick glance 25. 5. 3. 2/ 3 suffer a setback 26. 5. 15. 0/ 3 appear your usual self 27. 5. 26. 2/ 3 bring an improvement to a situation 28. 5. 25. 2/ 3 express a perceived air of superiority 29. 5. 33. 0/ 3 express or show something to someone 30. 5. 23. 0/ 3 ignore wrongdoing 31. 5. 20. 0/ 3 pay a social visit while going somewhere else 32. 5. 28. 0/ 3 pay a short visit 33. 5. 16. 0/ 3 expect (hope) to do something 34. 5. 6. 2/ 3 ignore someone or something by not to notice see them 35. 5. 34. 2/ 3 Table 5.8 Correspondences between the corpus-attested levels of prototypicality of senses of the verb look, the qualitative analysis of the category membership, and their first recorded usages. <?page no="115"?> 115 The prototypicality data presented in Table 5.5 (obtained as an outcome of the corpus-based semantic analysis of the verb look and expressed as a combination of the frequencies of occurrences, feature ratios, and the overlap ratios of the senses) was shown to be generally inadequate when evaluating it against the different possible criteria used for attesting prototypicality. Excluding the prototype sense itself, we were able to attest 7 (out of 34) relatively close similarities between the quantitative and qualitative analysis, though further investigation was hindered by the coarselygrained hierarchy obtained through introspection. A comparison with etymological data also yielded 7 (out of 34) confirmed close correspondences (6 of them completely different than the ones suggested by the correspondences between the quantitative and qualitative hierarchies). The question is whether we should then consider the levels of prototypicality presented (other than the prototype itself) as wrong? The answer should be “Not yet”: before we discount the given levels of prototypicality we must realize that neither the qualitative analysis nor the etymological data represent a solid base for attesting category membership, as indeed it seems that there is no solid base to do so at all. Any criteria or approach we might have used (including native speaker intuition of category membership or any kind of elicitation experiments) could not have in fact been taken as conclusive evidence on prototypicality (even if it was available for the senses of the verb look) because of the fuzziness of the categories and the strong influence of personal and cultural experience on identifying family resemblance (we can recall here Labov’s cup example (1973)). The unfavorable and inconclusive evaluation of the corpus-based semantic account of prototypicality of senses together with an apparent lack of ways to conduct the said evaluation in a more meaningful and conclusive manner invites us to yet another discussion of theoretical and practical implications. 5.4 Theoretical and practical implications If we disregard the etymological data used as a means of testing the prototypicality hierarchy as unreliable and problematic source-wise, we can see that the prototypicality of senses based on corpus data alone seems to show promise. The qualitative analysis almost fully confirmed the more finely grained top members of the prototypicality outline, but was too coarsely grained in total to properly attest for all of the senses. However, the main <?page no="116"?> 116 reason why this corpus-based methodology of accounting for levels of category membership can and should be seen as credible is that it is firmly grounded in (corpus) frequency of occurrence (in accordance again with Geerearts (1988); Aitchison (1998); Schmid (2000); and Gilquin (2006)). Corpus frequency of a sense is linked, as mentioned earlier, to a generally higher frequency of use and hence earlier acquisition, faster priming, and earlier elicitation. Frequency of use seems in fact behind all of the mentioned modes of categorial analysis (both linguistic and extra-linguistic, including etymological data as well). That is, it seems, the deciding factor why our corpus-based semantic analysis yielded a plausible result on the prototype status of the ‘direct your gaze towards someone or something or in a specified direction’ sense and on the hierarchy of most other senses of the verb look. Though other factors of the analysis, such as the gathered distributional data and the extent of contextual interplay (seen in the feature ratio and the overlap ratio), seem to also support this account of prototypicality based on frequency of occurrence alone, their influence can, however, also be traced back to frequency of occurrence. If we look at Table 5.5 again, we can see that the higher the frequency of the sense, the higher its feature ratio and overlap ratio (and automatically the prototypicality ratio as well) are going to be. The trend seems quite easy to explain - we have already argued that the more frequently a sense is used, the more it is expected to appear in different semantic and syntactic contexts. Conversely, the more specific a sense is in its use, the less frequent it will be and the need for its appearance in more diverse contexts will be less pronounced as well, sometimes being highly specialized and contextually limited. The road indicated also seems to be not from cognitive salience towards linguistic salience, but other way around. This overall dependence of all the discussed contextual factors on the frequency of use actually seems to be indicating that the plausibility of the procedure in identifying the prototypicality of senses rests solely on the frequency of occurrence information. This reason can perhaps also be seen behind the fact that the outline of the degrees of category membership offered by corpus-data was fully confirmed neither by the etymological comparison nor by (for the most part) qualitative analysis. Since it has been recorded that prototypicality often seems to be counter-intuitive, a degree of discrepancy between the various modes of evaluation (especially between the qualitative and quantitative ones) is perhaps to be expected. The inability to objectively and unambiguously evaluate the corpus-based hierarchy of prototypicality (or any outline of category membership for that matter) and the <?page no="117"?> 117 fact that frequency of occurrence seems to be the major factor influencing all modes of categorical analysis lead us to the theoretical and practical issues of using solely distributional linguistic data in identifying levels of sense prototypicality. If we evoke the maxim of knowing a word by the company it keeps (Firth 1957: 11) once again and expand the metaphor slightly, we will see that the more a sense is seen out and about, the more friends it is going to have. It will be the first and the most common one we meet and the easiest to recognize later; people will remember it better when it comes to distinguishing it from other similar senses they may encounter. It seems that knowing the immediate company could be sufficient to know the category structure of a given word. The practical point of view, however, begs the question of how much knowledge about that immediate company (linguistic microcontext) is needed for the identification of the prototype sense and the levels of sense prototypicality. How extensive must the corpus-based semantic analysis be in order for the microcontext to be able to point to the category structure of the category of polysemy sufficiently and accurately? Do we really need to attest all of the contextual features and identify all of the points of intersection between the senses (compiling feature ratios, overlap ratios, and calculating prototypicality ratios) or might a less extensive procedure suffice? The indication, stemming from Table 5.8, seems to be that a far smaller amount of data is necessary - the frequency of occurrence of senses appears to be quite enough, for several reasons. The first reason is that frequencies of occurrence, as stated before, seem to be (for the most part at least) the major influencing factor on other indicators of category membership: the more prototypical senses are acquired by children before other members of the category since they are more frequently used around them during the language acquisition stage. More prototypical readings are produced first in psychological priming experiments precisely because they are most frequent in use, closest to the surface, and easiest to activate in the mental lexicon. Frequency of occurrence also accounts for the fact that the more prototypical senses are also easiest to memorize and recognize and are hence perceptually most salient. Even looking at etymological data 30 one can stipulate that their earlier origins are caused by the frequency of the phenomena they had to denote which was obviously 30 The unreliability of etymological data comes more likely from the unreliability of the original records of usages rather than being inherent in the data themselves. <?page no="118"?> 118 higher than the other denotates the same lexical items had to denote 31 - the problem lies in the unreliability of their first recorded uses in etymological dictionaries. The second reason is that the linguistic criteria such as the extent of interplay of one reading with contextual features and with other senses (feature ratios and overlap ratios), as explained previously, stem directly from frequency of occurrence as well. We have seen quite clearly that the more frequent the sense is, the more need it will have to appear in as wide a contextual framework as possible. This width of interplay will, in turn, be the cause more overlap with other senses, each of them developing their own width of contextual interplay related to their frequency of occurrence. These practical considerations which led us to the realization of the importance of frequency of use in defining levels of category membership lead us now to some theoretical implications. We have already seen that it is perhaps not necessary to perform the full corpus-based analysis for a credible outline of the degrees of prototypicality of a polysemous item, as it is certainly impractical, extremely time-consuming, and would result in roughly the same contour. So rather than the question of whether one needs all of the obtained contextual information for attesting the levels of prototypicality (for which we saw mere frequency of occurrence would seem to suffice), the real theoretical issue is whether we need this set of data to better describe the semantic and the syntactic links between the members of the category and so further describe the nature and reasons of their obtained hierarchical outline. In essence, the big question is does prototypicality have any connection to meaning extensions or is it rather a frequency-caused hierarchy? The complete answer to this question appears to be beyond the scope of the analysis presented here and still remains to be fully addressed in linguistics. The indications are, as seen in the successful partial comparison between the quantitative and qualitative outline of the senses and as we will partly answer in the following chapter on lexical networks, that there is a link between meaning extensions, frequency of use, and the degree of prototypicality of senses. All we can say in the end and within the scope of corpus-based semantics, if we re-visit the Firthian 31 It is quite plausible to imagine that naming the movement of your eyes in order to see something (expressed by the prototypical sense ‘direct your gaze towards someone or something or in a specified direction’) was communicatively more important, and hence earlier in etymological origins because it was more frequently employed, than naming ‘to inspect something using your eyes and your thinking apparatus’ (expressed by the ‘examine and consider’ sense). <?page no="119"?> 119 maxim (1957: 11), is that when it comes to prototypicality of senses it does seem that the maxim applies quite well. It is seemingly enough to know the company a given word keeps for us to be able to tell its best exemplar sense and to learn more regarding the declining levels of category membership as well. In fact, we do not need to know more about it than how often a sense appears in public. <?page no="120"?> 120 6. A corpus-based account of sense networks Following directly from the previous chapter, a useful way of outlining semantic knowledge and furthering the insight on how meanings are organized on a categorical level seems to be through attempting to mirror the way the network structure knowledge is generally believed to be organized in our minds. This kind of modeling is also useful for gaining insight into the human semantic processing as well as into the interaction between semantic structures and semantic processing (Steyvers and Tenenbaum 2005: 42). It is meant to further complement the information on polysemy by including insight into how meanings are internally organized and how they interact with each other within one polysemy spectrum. Hence, we cannot stress enough the importance of such a criteria-based account of the possible make-up of lexical networks as corpus-based semantics seems to be proposing. As we have seen, the original notion of a network organization of knowledge comes from psychology and early cognitive linguistics. A supposed system is imagined in our minds as a multi-modal and multidimensional representational network requiring complex processing for its structure to be activated (Gilquin 2008: 24). Mirroring this network of knowledge, the conceived semantic (lexical) network should function in a similar way. It should operate on the basis of schema induction on three levels (Evans and Green 2006: 280), the mappings of which connect given conceptual and lexical representational levels to a wider network of related knowledge. The conceptual network is by definition not as rich in information as the actual perceptual information, and the semantic representations are poorer still. That is why the extension to the semantic level of representation serves only as a prompt for the activation of higher-level constructions encoding the experiential information activated in the given mediatory conceptual network. Nonetheless, it is precisely this lowest representational level that stands out as important because it complements the conceptual information and serves as the initial activator of the whole schema. The initial activation itself is not straightforward since the structure of lexical networks is quite context-dependent in its semantic behavior. One semantic representation does not have a unique and exclusive link to one conceptual node or one point of perceptual information. We have seen that the construction of meaning is mostly pragmatically conditioned and that lexical meaning is a product of a unified discourse-related process. The argument is that lexemes in themselves are not self-sufficient pre-filled <?page no="121"?> 121 packages of meaning, carefully weighed-out and ready for use. Rather, we have stated that meaning construction is an on-line process which starts with encoding by the lexical network domain and ends with decoding and processing at the levels of the conceptual and perceptual network. Some instances of this online process are more directed and pre-arranged than others, pending on their entrenchment in the network itself. It is hence very important to see how the internal structure of the semantic network functions and contributes to this process and how we can formalize its representation through some form of linguistic description. The earliest work on the structuring of lexical networks saw them as simple hierarchical tree organizational structures where the connections between the branches and the levels of the network were based on whether or not a certain meaning belongs to a given class (such as ANIMAL, PLANT, etc.), starting at the most general possible level to which they can be seen as belonging (Collins and Quilliam 1969: 460). The system seemed to work well when applied to simple taxonomical relations such as ‘a dog is a mammal’ for example (Figure 6.1). Figure 6. 1 Hierarchical tree-structure of a lexical network (adapted from Collins and Quilliam 1969). Such a simple inheritance structure does not, however, allow for much extension. As such it would be ill equipped to deal with the attested fuzziness found within almost every category. If one attempted to place say a ‘dragon’ within such a structure, its one-dimensional architecture would not allow for the necessary extensions. Therefore, a logistic system which would be able to represent the multi-dimensional organization of <?page no="122"?> 122 knowledge was necessary. One approach to building such a network structure is the arbitrarily structured lexical network model (Steyvers and Tenenbaum 2005: 43). According to this model, the networks themselves have no discernible hierarchy, no beginning or end. The lexical concepts within them are connected by some form of associative relationship, be it memory activation elicited by priming (Anderson 1983: 265) or some other semantically or cognitively motivated relationship used in describing the nature of human semantic memory. Such generic associative network organization is a relatively well-accepted form of representing lexical (and conceptual) networks. However, regardless of the model of the lexical network we subscribe to (whether hierarchical or arbitrary), the more important issue is what constitutes the links between the members of any given lexical network. It is a contested issue not to be addressed so easily. The problem of representing and defining the workings of semantic memory and knowledge acquisition and activation is an extremely complex one. Ulitmately it requires looking into multiple facets of scientific endeavor such as neurology, psychology, biology, etc. (Steyvers and Tenenbaum 2005: 42) in the search for a full answer. The research presented here focuses only on the dimension of the principles governing links amongst readings of semantic representations alone. This level of semantic processing does not necessarily reflect the underlying (potentially) active conceptual structure but rather gives us only insight into the organization of the surface lexical network. We will attempt to gain such insight by utilizing both of the possible network structures: the hierarchical and the arbitrary construction alike. The criteria proposed for defining and describing the links between senses comprising the network stem from both the general semantic discussion in Chapter 2 (basically that context is in fact the major agent in meaning creation and extension) as well as from the Lexical Network theory which observes the different senses within a polysemous lexeme as an interconnected network of related aspects of unified meaning (Norvig and Lakoff 1987: 201), tying itself nicely to the prototype theory. The concrete realization of these links, from the point of view of applied corpus-based semantics, will be slightly different for the two types of network structures. Within the hierarchical network construction, the units of representation will be the corpus-based hierarchical extensions from more prototypical to less prototypical senses (given originally in Table 5.5). The arbitrary network, on the other hand, will see the links as realized through the extent of the overlapping contextual distribution of the <?page no="123"?> 123 senses and will understand senses as extending from larger to smaller overlap ratios. 6.1 Quantitative analysis Following the discussion given above, we initially propose two contextually-motivated conditions which can direct the structure of a lexical network - one is the prototypicality of senses (in line with the Lexical Network theory and resulting in a hierarchical network type) and the other is the extent of the shared context in which they appear (which results in an arbitrary network construction). A quick visit to Table 5.5 can provide us with the necessary information on our ordered levels of prototypicality of the attested 35 senses of the verb look. Graphically represented, the category membership of the senses of the verb look would be something along the lines of Figure 6.2, where the numbers on the right represent the levels of prototypicality (1 to 35, according to the number of attested senses) while the codes in the containers represent the senses according to the numbers given originally in Table 3.1 (and in Appendix 1). The problem with this kind of a simplified representation of the lexical network of the verb look is again that it does not give us much information on the nature of the actual links between the senses - the nature that would provide information on the semantic value and cause of family resemblance. Such information is essential if we are to discover more about the criteria that condition the extension of an existing sense and the production of a new one. It follows that we cannot base our hierarchical network on prototypicality data alone. We can amend our proposal by additionally suggesting we use the identified weighted contextual features as the criteria for connecting the senses within the arbitrary network structure and in the hierarchical lexical network. The hierarchical structure will only merge with the contextual information with prototypicality data and thus obtain an additional dimension of relevance by including the information on the links between the senses. This kind of representation of the synchronic semantic links 32 between the senses would provide the necessary information on the way the senses overlap and are extended from each other. An attempt at the initial un-amended construction of the hierarchical network structure is given in Figure 6.3 with the given contextual overlaps 32 The focus here is on synchronic links alone. <?page no="124"?> 124 based on the strongest points of overlap between each two senses (obtained from Table 5.3). Figure 6. 2 Levels of prototypicality of the 35 identified senses of the verb look based on their attested prototypicality ratios. <?page no="125"?> 125 Now, before we can commence with the analysis of the network, we first have to tackle a problematic cluster of senses centering around ‘ignore wrongdoing’ (sense 22); ‘for a structure or part of body to have a view or outlook in a specified direction’ (sense 2); and ‘peruse a book or other written material’ (sense 6), as well as the sense stemming from them: ‘express a perceived air of superiority’ (sense 19) stemming immediately from sense 6; and ‘pay a social visit while going somewhere else’ (sense 33); and ‘ignore someone or something by pretending not to notice them ’ (sense 4), both branching directly from sense (2). The problem comes from their supposed levels of prototypicality which do not match their branching. As described before, the branching within the lexical network presented in Figure 6.3 is based on two measurements - one is our outline of prototypicality and the other is the non-hierarchical data of the highest points of overlap of the contextual features shared by each two observed senses. The idea is that when the highest point of overlap is identified amongst two senses the less prototypical (and presumably less general) sense should be extending from the more prototypical (and more general) one. Such is the case with almost all of the senses and the links presented in Figure 6.3. Only in the case of the given problematic senses, however, is there a discrepancy between their expected and their attested levels of network membership. To break down the problem, the sense (2) ‘for a structure or part of body to have a view or outlook in a specified direction’ shares more features with the ‘have the appearance of’ (13) sense than with any other. Since ‘have the appearance of’ (13) is more prototypical than ‘for a structure or part of body to have a view or outlook in a specified direction’ (2), the conclusion implied is that sense (2) should stem out of sense (13). That would not be such a structural problem were it not for the issue with senses ‘ignore wrongdoing’ (22) and ‘peruse a book or other written material’ (6). Both of these senses share most contextual features with ‘for a structure or part of body to have a view or outlook in a specified direction’ (2) and should be then seen as branching from it since it already branches out of ‘have the appearance of’ (13). However, both ‘ignore wrongdoing’ (22) and ‘peruse a book or other written material’ (6) have been attested as more prototypical than ‘for a structure or part of body to have a view or outlook in a specified direction’ (2) and according to our construction criteria it (sense 2) should be stemming from both of them, while already stemming from ‘have the appearance of’ (13) which is more prototypical than all of them. <?page no="126"?> 126 Figure 6. 3 The hierarchical semantic network based on the attested prototypicality and the highest overlap points of the contextual features. <?page no="127"?> 127 The question is the how to represent it within our lexical network (assuming that it is not a case of an error in the data)? The answer can be found in the realization that the higher up the lexical network we go the more contextually encompassing senses we will find. Accordingly, to locate the closest branching node one should try and connect any given sense to the hierarchically lowest node, since the higher related ones by definition encompass them both (seeing that their higher hierarchical position is originally based on their feature ratios and overlap ratios). Therefore, looking at the problematic senses, we can amend the overall hierarchical network accordingly by having ‘for a structure or part of body to have a view or outlook in a specified direction’ sense (2) branch out of ‘ignore wrongdoing’ sense (22) and ‘peruse a book or other written material’ sense (6) out of ‘have the appearance of’ (13), as done in Figure 6.4 below. Having arrived at a more satisfactory construction of the hierarchical network at this stage, we can look at our construction in greater detail. The main cluster of grouping senses visible in Figure 6.4 (let us call it cluster alpha 33 ), which actually includes all of the senses, is the one centering on the prototypical sense ‘direct your gaze towards someone or something or in a specified direction’ (in code sense 1). This is obviously not surprising, as it has already been identified as the central member of the category, contextually most diverse, and the most frequently occurring one when interplaying with other senses. Additional helpful information arrives if we look at the senses branching out from this prototypical and all-encompassing one, initially focusing on the first-level directly stemming from the prototype (the beta clusters). Several larger and branching clusters can be seen, as well as the senses that stem from the prototype but do not branch further on. The non-branching ones are ‘quickly take notice’ (31), ‘express or show something to someone’ (3), ‘eagerly await something or someone’ (28), ‘observe an event without getting involved’ (30), ‘examine and consider’ (9), and ‘show likelihood of’ (14). 33 One again, the levels of senses presented in both networks by the terms alpha, beta, gamma, delta and epsilon represent a combination of their attested levels of prototypicality (based on their prototypicality ratios) and the fact whether they branch out (based on their contextual features) directly from the prototypical sense or from the senses already branching out from it (when looking at the hierarchical network). In the arbitrary network they stem from the most contextually encompassing (alpha) sense and follow the same line of reasoning. <?page no="128"?> 128 Figure 6.4 The amended hierarchical semantic network based on the attested prototypicality and the highest overlap points of the contextual features. <?page no="129"?> 129 Stemming from this first original cluster, there are three major (multiple) diverging senses: ‘have the appearance of’ (13), ‘attempt to find’ (11), and ‘think of or regard in a particular manner’ (8), accompanied by three minor ones (extending into only one or two senses) ‘investigate in great detail’ (10), ‘inspect something briefly’ (5) and ‘evaluate someone or something with a quick glance’ (12). The next (third) level of senses (the gamma clusters), now stemming from the first branching (beta) ones, includes ‘expect (hope) to do something’ (17) branching from sense (12); ‘search for and find a particular information in a piece of writing’ (34), ‘suffer a setback’ (27), ‘make future plans’ (23), ‘reminisce past events’ (26) and ‘bring an improvement to a situation’ (32) branching from sense (8); ‘move round in order to investigate something’ (7), ‘rely on someone and something’ (sense 16), ‘take care of someone or something’ (25), ‘evaluate someone carefully’ (24), and ‘view someone with superiority’ (20) branching from sense (11); ‘peruse a book or other written material’ (6), ‘ignore wrongdoing’ (22); and ‘appear your usual self ’ (15) branching from sense (13); ‘have respect for someone’ (35) branching from sense (10); and ‘pay a short visit ’ (29) and ‘oberve someone without showing embarrassment or fear’ (21) branching from sense (5). At the next delta level or branching we find three senses: ‘have the appearance of being as old as you are’ (18) branching from sense 7, ‘for a structure or part of body to have a view or outlook in a specified direction’ (2) stemming from sense (22) and ‘express a perceived air of superiority’ (19) stemming from sense (6). The last epsilon level of branching also contains two senses, both branching from sense (2): ‘pay a social visit while going somewhere else’ (33) and ‘ignore someone or something by pretending not to notice them’ (4). An attempt to construct an arbitrary lexical network of the verb look (given here in Figure 6.5 below) does not bring much novelty to the discussion. The links between the senses are structures of the same contextual data as the ones given in the hierarchical network above and represent the strongest points of overlap between the senses. The criterion that differentiates this arbitrary construction from the hierarchical one is, if we recall, that the branching is only defined on the basis of inter-categorial similarity and feature ratios, lacking the hierarchical information based on prototypicality ratios. That is why the subsequent evaluation of the networks can go hand-in-hand as essentially the same types of semantic links (if we exclude the prototypicality ordering) are being evaluated. <?page no="130"?> 130 Figure 6.5 The arbitrary network construction based on the highest overlap points of the contextual features. This short description of the branching pattern still does not, however, explain enough about the nature of the links making up the given structures, but it has the potential to do so. A closer interpretation coupled with a more suitable evaluation will give us a chance to tap this potential. <?page no="131"?> 131 6.2 Evaluation of the lexical network construction The criteria for evaluating the success of the branching (which follows) will be two-fold: one criterion will be a qualitative analysis of the construction and its implications; the second one will be a comparison with the historical development of the senses (presented ranked in Table 5.6). 34 Hierarchical information is supposed to compare to the way meanings extended from one another etymologically, which was in turn motivated by their necessitated frequency of denotation and performed by extending their meanings in some manner. The part of the analysis referring to the prototypicality information will relate exclusively to the hierarchical network while all other qualitative deliberations apply equally well to both types of networks. The easiest dimension of the network to explain and the one that needs no particular evaluation (other than the one already presented in the previous chapter) is the alpha cluster, in both of the network constructions. This cluster includes all of the 34 senses which are all seen as stemming from the remaining 35 th core sense ‘direct your gaze towards someone or something or in a specified direction’. The role this prototype meaning plays in meaning production and meaning extension is easily observable and has been elaborated on extensively in the two previous chapters. Of much greater interest then are all of the other clusters of various levels branching out of this prototypical sense in addition to their own novel aspects of meaning extensions. The beta-level senses, stemming directly from the core sense, are the first in line to be appraised. 6.2.1. Beta-level senses The first thing that can be confirmed at the onset of the qualitative analysis of this cluster of senses is that all of the readings seen as stemming directly from the prototypical one are at their intended and undisputed location 34 There was one more possibility of evaluating the given lexical networks and that was by looking at whether the extension from one sense to another presented in the constructed lexical network is a predictable one based on the features they differ in. It was not conducted here because of the enormous complexity of handling a matrix of 34 times 34 senses comprising an interrelated multi-dimensional network and involving tens and hundreds of feature types and thousands of feature tokens to share and differ in. However, extensive work on this problem is in progress and will be reported on in some future publication. <?page no="132"?> 132 with a lexical network of any construction. They are all, by default, below the first order and most contextually diverse prototypical sense, from which, again by default, all other senses should be extending regardless if we observed it as the top of a hierarchy (in the hierarchical network) or as the center of the semantic scope (in the arbitrary network). This means that we can evaluate their position in the hierarchical network as proper and focus on the way the readings of these second level senses extend from this first possible node in both network structures. The first to be appraised are the non-branching beta-level senses: ‘quickly take notice’ (31); ‘express or show something to someone’ (3); ‘eagerly await something or someone’ (28); ‘observe an event without getting involved (30); ‘examine and consider’ (9); and ‘show likelihood of’ (14). The first part of this qualitative examination is fully based on the similar introspective discussion on the senses of the verb look given already in Chapters 4 and 5. It is easy to see how the senses ‘quickly take notice’ (31), ‘observe an event without getting involved (sense 30), ‘express or show something to someone’ (3), ‘eagerly await something or someone’ (28), and even the sense ‘examine and consider’ (9) 35 could be seen as directly stemming from the prototypical sense ‘direct your gaze towards someone or something or in a specified direction’ (1) and not from some other lower branching node. When it comes to ‘show likelihood of’ (14), however, it is not easy to explain how it does not stem from ‘have the appearance of’ (13) - sense (13) is higher in the scale of prototypicality (in the hierarchical network), it has a higher feature ratio than the given sense (in both networks), and, most crucially, it also includes the passive semantic aspect of the ‘gaze being directed’ and ‘suffered by someone or something’ recognizable in ‘show likelihood of’ (14) as well. Moving on, the following senses also directly branch out from the prototype only they also branch out further themselves (the branching betalevel senses) and they include: ‘have the appearance of’ (13); 35 Though it is a removed metaphorical extension and perhaps better placed somewhere else, as in under ‘think of or regard in a particular manner’ (8) and below it under ‘attempt to find’ (11). <?page no="133"?> 133 ‘attempt to find’ (11); ‘think of or regard in a particular manner’ (8); ‘investigate in great detail’ (10); ‘inspect something briefly’ (5); and ‘evaluate someone or something with a quick glance’ (12). Again some of the senses are easily recognized as having no other more encompassing and more probable ancestry but the prototype meaning itself: ‘have the appearance of’ (13) and ‘think of or regard in a particular manner’ (8), for example. The other four senses seem not to belong to this branching node. The ‘attempt to find’ (11) sense is perhaps better placed as stemming from ‘think of or regard in a particular manner’ (8) seeing that it represents a further extension of its ‘direct your mental gaze’ aspect of meaning found already in the semantically (as well as hierarchically and feature ratio-wise) closer ‘think of or regard in a particular manner’ (8). The same goes for ‘investigate in great detail’ (10), ‘inspect something briefly’ (5), and ‘evaluate someone or something with a quick glance’ (12), all of which can be much better placed under their prototypicality antecedent ‘attempt to find’ (11) as the extensions of the ‘use your physical and/ or mental gaze to locate someone or something’ aspect of meaning (in its turn extended from ‘think of or regard in a particular manner’ (8), and ultimately, through this sense, from the prototype). To sum up, out of the 12 senses belonging to this second beta level of branching (in both the hierarchical and the arbitrary network), all stemming directly from the prototype, our qualitative analysis seems to concur with only 7 (‘express or show something to someone’ (3), ‘quickly take notice’ (31), ‘observe an event without getting involved (30), ‘eagerly await something or someone’ (28), ‘examine and consider’ (9), ‘have the appearance of’ (13), and ‘think of or regard in a particular manner’ (8)). The other 5 (‘show likelihood of’ (14), ‘attempt to find’ (11), ‘investigate in great detail’ (10), ‘inspect something briefly’ (5), and ‘evaluate someone or something with a quick glance’ (12)) seem to better fit some other place of extension within the network as they already convey a certain amount of the original meaning seen in their semantic (prototypicality-wise and contextwise) superiors. <?page no="134"?> 134 6.2.2 Gamma-level senses The next level (the third if we count the prototype alone as the alpha one) of senses, the gamma senses are: ‘expect (hope) to do something’ (17) branching from sense (12); ‘search for and find particular information in a piece of writing’ (34), ‘suffer a setback’ (27), ‘make future plans’ (23), ‘reminisce past events ’ (26) and ‘bring an improvement to a situation’ (32) branching from sense (8); ‘move round in order to investigate something’ (7), ‘rely on someone and something’ (16), ‘take care of someone or something’ (25), ‘evaluate someone carefully’ (24) and ‘view someone with superiority’ (20) branching from sense (11); ‘peruse a book or other written material’ (6), ‘ignore wrongdoing’ (22), and ‘appear your usual self ’ (15) branching from sense (13); ‘have respect for someone’ (35) branching from sense (10); and ‘observe someone without showing embarrassment or fear’ (21) and ‘pay a short visit ’ (29) branching from sense (5). In contrast to the previous beta level, this level of the lexical network needs to be investigated first in terms whether all of the senses stemming from the previous branching level of senses are of the extension level they have actually been attested for and lower than their identified higher and more encompassing nodes within the hierarchical network. A quick glance at Figure 6.4 reveals that all the senses provided above do come, in terms of their attested prototypicality, below the respective sense from which they stem. They also stem from their more contextually diverse superiors within the arbitrary network construction. Having this aspect of the networks confirmed, the second aspect of the qualitative evaluation of the semantic content of both types of networks naturally follows suit. Going in the order they were presented above, we can see that the extension from ‘evaluate someone or something with a quick glance’ (12) to ‘expect (hope) to do something’ (17) does not really fit as a much better place of extension can be recognized in the ‘gaze being directed by someone or something’ aspect of sense (1) transfigured into a more metaphorical way of turning towards the future or past. This aspect of meaning is for example found also in the senses ‘make future plans’ (23), ‘eagerly await someone or something’ (28) (not in fact in this cluster), and ‘reminisce past events’ (26), all in fact belonging better as branching from the prototypical <?page no="135"?> 135 sense. Looking at the next set of extensions, this time from ‘think of or regard in a particular manner’ (8), which include ‘search for and find particular information in a piece of writing’ (34), ‘suffer a setback (27), ‘make future plans’ (23), ‘reminisce past events’ (26), and ‘bring an improvement to a situation’ (32), once again it is a plausible point of extension for some senses and quite an unlikely one for others. For example ‘search for and find particular information in a piece of writing’ (34) seems quite a likely candidate for the extension of the ‘direct your mental gaze’ aspect of the more encompassing sense ‘think of or regard in a particular manner’ (8) (even though it could perhaps fit better with another antecedent such as ‘attempt to find’ (11) and its meaning of ‘use your physical and/ or mental gaze to locate someone or something’). Similarly to this sense, the ‘make future plans’ (23) and ‘reminisce past events’ (26) senses also seems to convey the same aspect of ‘direct your mental gaze’ though again a better and slightly hierarchically lower slot would perhaps suffice (for example ‘examine and consider’ (9)). The other senses do not seem to have anything of this aspect of meaning within them - ‘suffer a setback’ (27) and ‘bring an improvement to a situation’ (32) all better fit under the prototypical sense itself as the direct first-order metaphorical extensions. The third grouping centering around ‘attempt to find’ (11) includes ‘move round in order to investigate something’ (7), ‘rely on someone and something’ (16), ‘take care of someone or something’ (25), ‘evaluate someone carefully’ (24), and ‘view someone with superiority’ (20). The only sense that can actually be recognized as related to its supposed point of semantic extension within this cluster is ‘move round in order to investigate something’ (7). Even though it was previously attested as less prototypical than ‘attempt to find’ (11), this sense intuitively seems to be better situated above this sense as it seems that ‘move round in order to investigate something’ (7) has a broader meaning (involving primarily directing your physical gaze) from which a more abstract one ‘attempt to find’ (11) could be postulated as stemming. The other senses in this grouping seem to share very little with their supposed semantic antecedent and are again much better placed stemming directly from the prototype. The fourth sub-branch incorporates ‘peruse a book or other written material’ (sense 6), ‘ignore wrongdoing’ (22), and ‘appear your usual self’ (15), all seen as being produced out of ‘have the appearance of’ (13) sense. Apart from the ‘appear your usual self’ (15) sense which shares the aspect of the ‘gaze being suffered and attracted rather than directed’ with ‘have the appearance of’ (13) sense and it further extends it (seeming as quite a successful and predictive branching point), the <?page no="136"?> 136 other two senses do not share any aspect of that intended meaning with sense 13. The sense ‘ignore wrongdoing’ (22) belongs much better to the prototype while ‘peruse a book or other written material’ (6) more likely branches out from the ‘examine and consider’ (9) sense. The penultimate node with ‘have respect for someone’ (35) branching from ‘investigate in great detail’ (10) does not seem to be very plausible either, as there can be very little semantic similarity identified in terms of the former sense stemming from the latter one. The likely point of origin for the ‘have respect for someone’ (sense 35) is once more the prototype itself. The last subbranching claims that ‘observe someone without showing embarrassment or fear’ (21) and ‘pay a short visit’ (sense 29) are best seen as extending from ‘inspect something briefly’ (5). While it can perhaps be argued that ‘pay a short visit’ (29) carries some of the meaning of ‘attempting to find someone or something’ it is much better suited somewhere else, together with the ‘observe someone without displaying embarrassment or fear’ (21) sense, such as for instance under the prototypical sense since it extends its original meaning before any others. So, summing up the qualitative evaluation of the gamma cluster we can see that out of the 17 senses, only 5 can be plausibly placed under their proposed branching points while others, as described above, fit much better elsewhere within the network. 6.2.3 Delta-level senses The delta-level cluster is thrice removed from the original allencompassing prototypical sense (and cluster alpha it represents) and it stands as an ever further extension of the original ‘directing of gaze’ kernel of meaning. The cluster includes very few cases: ‘have the appearance of being as old as you are’ (18) branching from sense (7); ‘for a structure or part of body to have a view or outlook in a specified direction’ (2) stemming from sense (22); and ‘express a perceived air of superiority’ (19) coming from sense (6). As with the previous clusters of senses, the first thing to assess with the delta-level senses is whether their attested prototypicality fits their branching within the hierarchical network and whether the differences between the extent of their contextual interplay corresponds to the outline given <?page no="137"?> 137 within the arbitrary network. A brief inspection of Figure 6.4 tells us that they do. Following the set pattern of evaluation then, their qualitative analysis is intended to tell us more on the nature and plausibility of their represented relationships. Once we look at the first sense here ‘have the appearance of being as old as you are’ (18) and try to pose how it stems from ‘move round in order to investigate something’ (7), there seems to be no viable direct connection between the supposed antecedent and the sense branching out of it. The ‘have the appearance of being as old as you are’ (18) sense belongs much better, and quite transparently so, under ‘have the appearance of’ (13), expanding on its ‘gaze being suffered and attracted rather than directed’ aspect of meaning. To pose how ‘for a structure or part of body to have a view or outlook in a specified direction’ (2) is produced by extending the sense ‘ignore wrongdoing’ (22) (using any semantic means available) is even harder. The sense ‘for a structure or part of body to have a view or outlook in a specified direction’ (2) can more easily be seen as an extension of the prototype sense ‘direct your gaze towards someone or something or in a specified direction’ (1) with a change of the subject - moving from an animate subject doing the given watching and facing a certain direction in (1) towards an inanimate one in (2). The same line of argumentation goes for ‘express a perceived air of superiority’ (19) and its supposed branching node ‘peruse a book or other written material’ (6) - in the same way as the previous example is easier to understand as branching out of the most categorically overreaching prototype sense. The conclusion on this delta-level branching of senses is that out of the three senses it encompasses all can be discounted, according to our introspection, as having branched out of their proposed antecedents. 6.2.4 Epsilon-level senses These senses of this fifth and last level are now four times removed from the prototypical sense and there are only two of them, both branching out from sense (2): ‘pay a social visit while going somewhere else’ (33); and ‘ignore someone or something by pretending not to notice them’ (4). Being of the last possible hierarchical level in the hierarchical lexical network (seen in Figure 6.4) there is no question of their level of prototypicali- <?page no="138"?> 138 ty since they are by default lower than any other possible sense that may precede them hierarchically. The same goes for the contextual diversity in comparison with all other senses which precede them in the arbitrary network structure. Looking at them qualitatively, however, we get yet another unfortunate result for both of the networks. We would be very hard pressed to argue successfully that ‘pay a social visit while going somewhere else’ (33) or ‘ignore someone or something by pretending not to notice them’ (4) share any of their immediate meaning with ‘for a structure or part of body to have a view or outlook in a specified direction’ (2). The sense ‘pay a social visit while going somewhere else’ (33) shares most of the immediate meaning with ‘attempt to find’ (11) as it conveys ‘an attempt to locate someone in order to meet them’ while ‘ignore someone or something by pretending not to notice them’ (4) is clearly best connected to the prototypical sense (1), conveying the ‘[not] direct[ing] your gaze’ aspect of meaning. Therefore, once again we have a case where none of the senses seem to fit their intended meaning extension points. The fact is that out of the 35 attested senses of all levels only 8 (including the prototype) appear to us and our introspection as plausible in the place proposed by the lexical networks given in Figure 6.4 and Figure 6.5. The question forced upon us here is whether we should declare the corpus-based model of a lexical network inapplicable when it comes to the overall construction of lexical networks (both a hierarchical and an arbitrary one)? Before we are forced to reach that conclusion, there is one more avenue of evaluation we are committed to following. 6.2.5 Evaluation through historical development The second criterion intended to be used in evaluating the given lexical network and the hierarchy of its meaning extensions is the etymological origin of each sense in comparison to the etymological origin (expressed here as earlier recorded use) of its predecessor sense (whether by prototypicality hierarchy or by feature ratio). Table 6.1 below provides the information on the correlations between these two sets of data (based on the etymological information from the OED). The first thing to notice is that all of the beta-level senses (marked as in the first row in Table 6.1 below) can again be confirmed as having a proper match between their etymological origin and their place within both of the network constructions which is directly below the prototype sense. <?page no="139"?> 139 CLUSTER SENSE Y SENSE Y SENSE Y SENSE Y SENSE Y SENSE Y 31 1390 3 1727 28 1626 9 1300 14 1593 30 1872 8 1548 11 1394 13 1440 5 1590 10 1586 12 1300 23 1963 7 1754 6 1887 21 1357 35 1548 17 1377 26 1375 16 1611 15 1828 29 1604 27 1599 20 1711 22 1666 32 1806 24 1892 34 1692 25 1892 18 1828 2 1656 19 1940 4 1959 33 1852 Table 6. 1 The hierarchical levels of senses (in code) and their etymological order of origin (years of first recorded uses, marked by Y). It also comes as the default because the prototype is by definition (and the evaluation given in Chapter 6) the etymologically earliest sense (the year 1000, OED) and all of the proposed 12 beta senses originate later. When we look at the gamma level ( ) we can still see that most of the senses do confirm their lower point of branching with only few exceptions - out of 17 senses belonging to this level 15 of them are etymologically younger than their proposed semantic predecessors while only 2 do not seem to fit: ‘observe someone without showing embarrassment or fear’ (sense 21) and ‘reminisce past events’ (sense 26). Within both the delta ( ) and the epsilon ( ) level senses there is only one sense which does not fit - ‘for a structure or part of body to have a view or outlook in a specified direction’ (2), at the delta level. It is important to note, however, that even this one sense (out of five offered ones at this level) misses its required etymological point of origin by only 10 years (the proposed predecessor ‘ignore wrongdoing’ (22) has its first recorded use in 1666 while ‘for a structure or part of body to have a view or outlook in a specified direction’ (2) first stands as first recorded in 1656). All three discrepancies can perhaps be attributed to the problematic historical reliability of etymological data and inaccessibility of written records from the given era. If we give this unreliability of etymological data further weight and discount all three cases of the senses not fitting the intended etymological and hierarchical slots as being erroneous identifications due to the mentioned noise in etymological information, we can say that the hierarchy of extension in the network has been almost fully confirmed (32 of of 35 senses fitting perfectly). <?page no="140"?> 140 However, having in mind at this point that the qualitative analysis discounted the success of the lexical network and that the etymological comparison confirmed it (for the most part), the ultimate question is then whether the construction of the given lexical (semantic) network was successful and useful or not? 6.3 Theoretical and practical implications The issue of the construction of the lexical network and the role corpusbased semantic analysis played in its successfulness, similar to the problem of the success and accuracy of the prototypical hierarchy discussed in the previous chapter, is one more semantic consideration that cannot be answered conclusively enough and will leave many things open (as most theoretical accounts of semantic meaning seem to do in the end). We have seen that the qualitative analysis clearly revealed that most of the presented meaning extensions do not seem plausible at all, especially at the levels further removed from the prototype (gamma, delta, and epsilon). The etymological data has, on the other hand, almost fully confirmed the accuracy of the network, at least of the hierarchically structured one. However, unlike the uncertainty found within the evaluation of the presented levels of prototypicality which is dependent on various non-linguistic and non-semantic (cognitive) factors, the lexical network should, as it was presented and discussed within this chapter, focus much more on the lexical semantic aspect of senses (combining certain cognitive aspects only in its hierarchical construction). With such a focus we must give much more credence to the results obtained by our qualitative analysis than we did in the previous two cases when dealing with sense distinctiveness and category membership. It suggests that the given lexical network built in essence on the highest points of overlap between each of the senses is an inadequate one. The meaning extensions seen within it mostly do not make much sense. The very structure (whether of hierarchical or arbitrary construction) would not serve as an aid in any disambiguation task or in any lexicographic endeavor whatsoever because the eventual sense production patterns would not be possible to extract. One small element of uncertainty (and hope for the method) remains though - if it would be possible to construct an enormous multidimensional and multi-modal network containing all of the possible overlaps and lack of overlaps between each of the senses all together, and then weight out all of the links both according to their <?page no="141"?> 141 distinctiveness and frequency of occurrence, perhaps a better and a more accurate result could be expected. This extraordinarily complicated task (currently being attempted) raises a storm of theoretical and especially practical considerations. The practical implications refer to the usefulness of using distributional data for the construction of a lexical (semantic) network of any polysemous lexeme. The implication we arrive at is that corpus-based semantics and the contextual data alone are in fact not very useful. Here is why: the distributional data emerging from the corpus-based semantic analysis of the verb look did not help us in constructing a fully confirmable (or even perhaps plausible) hierarchical scale of prototypicality. At the same time, the procedure of attesting the points of overlap between the senses was a very timeconsuming and labor-intensive one (and not to mention the grueling procedures of also manually tagging the sentences and extracting the raw results in the first place). All that hard work, which is difficult to imagine as being easily employed in every-day lexicographic practice, in the end gave us no conclusive information. The semantic links obtained from the meeting points between the senses suggested by our contextual data had to be discounted as improbable in a closer analysis and the whole network had to be deemed as lexically of little use. It cannot even be said that the contextual data could be foreseen as ever being used even as an aid to more traditional methods of lexical network construction. The reasons are their identified failures at the level of the distinctiveness of the necessary salient contextual features and the impracticability of gathering them as well. The theoretical implications are however a little more positive. We can see that, though perhaps the corpus-based semantic analysis data is not of much practical usage in building a lexical network, that does not mean it has no bearing on its structure. Our open-ended deliberations on the theoretical (if perhaps yet not practical) possibility of using all of the sense overlaps and incorporating more facets of information obtainable from our full set of data may suggested that further research is necessary in attesting whether contextual distribution has any solid criteria to yield about lexical network links, since it definitely has an influence on them (seen in the striking correspondence with etymological data), and to what extent do the criteria extend that influence? The inference of our analysis seems to be that it may not prove as useful as we might desire, because contextual distribution seems to be motivated much more by issues of frequency (motivated in turn by the strength of the communicative need for given senses and the reference inviting them) than by semantic issues, which are them- <?page no="142"?> 142 selves crucial in successfully constructing a lexical network. It is perhaps also the issue of the use and functioning of a lexical network that we see as applicable and how that image differs from how the complete schema induction system (on all three levels) actually works. Another consideration of corpus-based semantics is that even a more in-depth look into the semantic influence of contextual features would also be hampered by the cancer of the corpus-based semantic methodology that data sparseness is. So, to get back for the third time to Firth (1957: 11), we can see that the company the words keep is not quite enough to tell us everything. Perhaps it is too erratic, counter-intuitive, non-semantically motivated, impractical to ascertain, and impossible to evaluate in order to really know the word itself. To truly know a sense, if we expand the metaphor once more, we must find out who and what it likes and who and what it does not, who are its parents and when was it born, does it have children of its own and when they have been born, what relationships it has been involved in and which of those ended badly and which are still current, what it knows about the other members of its company but is reluctant to say, and how it can help us find out more about the places it spends time in and the characters it hangs out with. To know all of that, we might have to call upon much more information than just the company it keeps - we need its entire social network with a lot more intimate data, the possible criteriaexpressible source of which is indeed hard to ascertain at this point. <?page no="143"?> 143 7. Discussion A fter running the research project for more than two years, overseeing the manual processing of data, poring over data sheets, working out the overlap points and calculating various ratios, the overall results presented in the previous three chapters, we must admit, are somewhat disappointing. The interesting aspects of the results heralded in the introduction seem to have been mostly disparaged by the lack of practical usefulness of the procedure (when looking at least two out of three aspects of polysemy, namely sense distinctness, prototypicality of senses and lexical networks, with the results on the prototypicality of senses remaining inconsistent). Listing once more the culprits already identified as the ones responsible for the relatively negative results of the application of corpus-based semantics to polysemy, we can see certain uniformity within them: sense distinctness and sense disambiguation based on linguistic (micro)contextual data was not successful (when compared to the upper inter-annotator agreement benchmark) due to two practical considerations: data sparseness which caused the lack of distinctive (prediction strong) contextual features; and data skewness which caused an overwhelming imbalance strongly favoring the more frequently occurring senses; the identification of the prototype sense and the levels of prototypicality within the category of polysemy of the verb look was perhaps only partially successful. The core reading was successfully identified but as for the rest of the senses the results were not uniform and not comparative enough (due to a general lack of any solid grounds of attesting them) to provide conclusive claims on their actual accuracy. The theoretical conclusion such inconclusive results brought up was that prototypicality does not seem to be connected to meaning extensions at all, which impaired our ability to qualitatively evaluate our corpus-based quantitative outline of it. The successful identification of the prototype itself appears to be solely linked to its extremely high frequency of occurrence (in comparison to most other attested senses). Frequency of appearance then is the likely driving force behind prototypicality and is linked to the frequency of use and the earliest acquisition of senses and their easier elicitation; <?page no="144"?> 144 the construction of the semantic network of the verb look (both arbitrary and hierarchical) based exclusively on the linguistic (distributional) data was also not such a useful enterprise. It was not able to sufficiently agree either with the qualitative analysis of its semantic links or the etymological comparison of the displayed semantic extensions. One of the two major practical causes for such a result is connected to frequency again and it involves finding enough distinctive and hence strongly-linking features. The other, also a practical reason, is to be found in the extreme impracticality of the construction of a multidimensional network which would involve all of the possible links between all of the possible senses (thinking here of all 35 senses interplaying with 34 other senses in up to 200 features,) in addition to the difficulties of performing the extremely time-consuming procedure of the full corpus-based semantic analysis in the first place. The uniformity we can notice within the reasons of why our procedure (and the corpus-based statistical approaches in general) failed at providing a satisfying account of the nuances of lexical meaning, 50 years after the original attempts in early MT failed as well, is reflected in one underlying factor - frequency of (co-)occurrence. Frequency of occurrence of the senses (together with the skeweness of data that comes as a given) is the main reason behind the problems of data sparseness (on all levels) which is, in turn, the main stumbling block of the applied methodology as a whole. We have seen that no amount of linguistic data (regardless of the size or the make-up of the corpus or the complexity of the ontology) can compensate for these two inherent frequency-related problems plaguing corpus-based semantic analyses and all statistics-based approaches to semantics. Such a realization begs the question of whether the whole idea of the distributional and the contextual grounding of semantic meaning is, despite our proclaimed agreement with it given in all previous chapters, theoretically unsound? Was John Firth (1957) (along with, as we saw, many other linguists over the last century and a half) actually mistaken? 7.1 Overall theoretical and practical implications The omnipresent issue of frequency as the main reason behind the failure of the methodology has both practical and theoretical implications for our <?page no="145"?> 145 final account the nature of meaning and of polysemy, especially when observed exclusively at the linguistic level of the microcontext. The practical implications were already relatively well summed up above. Trying to address the issue of lexical meaning by focusing only on linguistic information was revealed as implausible due to the skewness and sparseness of linguistic data (both coming from the corpora but in general as well). We have noted two reasons why the size of the corpus sample from which we would draw contextual data would not have a major effect in producing more satisfactory results: first, it would be extremely demanding to manually (or semi-automatically) process extremely large samples and, second, the quantitative discrepancy between the numbers of very frequent senses and the least frequent ones within any size of a corpus would still cause a problem (influencing both the frequency and the distinctiveness of the cooccurring contextual features). We have also seen that handpicking a corpus sample would not be any less time-consuming or be bereft of the same problem of the lack of distinctive data. Extending the ontology of the contextual features, we noted, would first make the procedure even more demanding, both time-wise and human resource-wise, and then it is also very improbable that a satisfactory number of sufficiently distinctive contextual features can be found within any possible contextual environment since most of the possible contextual surroundings, especially for the more frequently occurring senses, appear to be just not discriminate enough. The last practical considerations stemming from the overall research into the practice of corpus-based semantics (as applied to the polysemy of the verb look) are also the foundations of our main theoretical considerations. The theoretical conclusion that seems unavoidable here is that when it comes to semantic meaning and the manner in which the multiple readings interplay within a polysemous lexical item, the microcontextual conditioning is simply not strong enough to sufficiently account for their nature and their behavior. The major problem of data sparseness in this theoretical respect does not refer to the impossibility of gathering enough linguistic data, as corpora available these days number hundreds of millions of words (and more) and can boast of ever more advanced levels of automatic linguistic processing. Rather, data sparseness should be understood as the impossibility of finding enough microcontextual features in any amount of linguistic data which would be distinctive enough so they would exclusively condition a particular sense to the extent that its conditioning is, in turn, distinctive enough from other related senses of the same polysemous item (regardless again of the complexity of the tagset one uses to annotate the <?page no="146"?> 146 microcontexts at hand or the size the corpus one is working with). Another consideration is the inherent impossibility to make peace between the practical application and theoretical considerations. Lexicographers have to decide on the number of distinct senses they include in a dictionary, which they normally do on the basis of practical expediency and/ or pedagogical considerations. Theoretical linguists, on the other hand, can falsify claims about distinct senses, but will not be able to come up with the single correct analysis, not with all the data in the world, for the simple reason that the theoretical analysis eventually touches upon cognition and cognition probably relies on more than the linguistic context alone. The promise of an objective and purely linguistic account of word senses seems indeed a spurious one. This leads us to our question of whether Firth (1957: 11), together with a large portion of the linguistic community (including us), has been misguided in postulating context as the sole generator of every aspect of semantic meaning. The answer, it seems, must be that we are compelled to alter Firth’s claim only slightly and say that he was not in fact wrong in his maxim, but rather not explicit enough. The meaning is, to its largest extent, within the context but, as it seems, not within the linguistic one alone; it can only bring us so far in both practical and theoretical terms. It needs to be stressed that looking at other types of contexts is imperative: one must consider the various types of contextual influences we use when inferencing meaning (such as our general world knowledge, situational information, co-textual information, interpersonal knowledge, various sociolinguistic aspects of language, or cognitive processing), if we embrace, as we did, the inherent pragmatic nature of lexical meaning. All of the presented deliberations so far finally bring us to a point from which we can provide a conclusive account of corpus-based semantics. As a model of description of (lexical) meaning it is a very useful paradigm. It follows from an old and well established idea of the contextual conditioning of meaning (Paul 1920 [1880]) and permeates through a whole myriad of existing usage-based or exemplar-based theoretical accounts (outlined in Chapter 2). Terming it corpus-based semantics (rather than adopting one of the mentioned related approaches) seems more appealing and more revealing as the name refers both to the identified theoretical background of meaning and also to its empirical grounding. The practice of corpus-based semantics does not seem to be so straightforward or fulfilling. We have taken one of the more recent practical amalgamations of this usage-based view of meaning, Behavioral Profiling (Gries 2010), as a good representative of how such similar practical methodologies (listed in Chapter 4) func- <?page no="147"?> 147 tion. We have applied its general principles in a simplified, step-by-step, completely manual way to the semantic issue of polysemy. We identified three aspects of polysemy (of, in our case, the verb look) as crucial: sense distinctness, prototypicality of readings, and the make-up of its lexical network. The practical application and its results proved inconclusive at best, unsuccessful at worst, and impractical in all ways. The lengthy and demanding procedure of obtaining all the data necessary for a full linguistic description of the said aspects of polysemy did not yield any conclusive results. It is also questionable whether it is even, as was suggested (Gries 2006) a useful aid to other, more traditional, qualitative approaches to semantic (lexicographic) analysis considering the amount the work put in and the inherent insufficiencies of such data, and the inconclusive results obtained subsequently. It appears that even if we can acquire some useful distributional information from a full corpus-based analysis of a semantic issue (to serve us at least as the said aid in our work), the pay-off of the effort invested is just not sufficient (and will not be so until the processing of corpus data does not become available as automatic for any complexity of ontology we may require). The drain on resources such an analysis poses today renders the application of this methodology next to useless in lexicographic and NLP/ WSD terms. Having in mind then that the contextual information we found as missing is housed in various types of contexts all of which are inaccessible, illusive (or illusionary), and unquantifiable (such as cognition or worldknowledge for example), the final question we need to ask is whether it is possible to formally account for all of them in practice and produce a final, tangible, and conclusive account of semantic meaning and polysemy, both in theoretical and, even more importantly, in practical terms? 7.2 Implications for future research The answer to this important question posed to semantics as we understand it, is a theoretically categorical ‘yes’ and a practically reluctant ‘probably not’, or better ‘not yet’. We have seen in the elaborations above (and it was noted by numerous scholars) that semantic meaning is indeed hidden somewhere in a no-man’s land, existing on-line and in potentia between our cognition, the linguistic systems in pragmatic use, and the social, discourse, and the ‘real world’ we are a part of. Whether we will ever be able to have a proper empirical glimpse at all of these facets of the true nature of meaning is a major issue of <?page no="148"?> 148 lexical semantics today. There is a great amount of doubt if there is an actual scientific bottom to meaning to be reached. Such a realization should not, however, mean we should stop trying to reach as far down as we can. We could see from the work done within the research we presented in this book that a full linguistic description of meaning (in all of its aspects) is completely within our grasp through the existence of large (online) corpora and the ever-improving ways of their processing. Linguistic information, unfortunately, only takes us so far down the bottomless hole that meaning is. Regarding other types of contexts we have mentioned as necessary to be accounted for, we can only list the examples of work being done in an effort to account for them as well. Providing conclusive accounts of their concrete contributions and the answers they (may) bring forth would demand us to actually embark on a whole new set of research projects and is far outside our scope of corpus-based semantics and its possibilities. Hence, as basically just a brief outline of existing options, we can start with general or encyclopedic knowledge, since a lot of important work has been done in this field from a semantic point of view through the construction of knowledge bases. Knowledge bases in linguistics are an attempt to build on the construction of corpora by adding logical systems of data involving factual information of the ‘real’ world based on complex ontologies, all in an attempt to simulate the manner in which such knowledge and its links with language are supposed to operate within our minds (see for instance Bateman et al. (1989); or Knight and Luk (1994)). More work is still necessary for them to be of practical use, but there have been major breakthroughs nonetheless. The second avenue of approaches comes through the fact that many cognitive linguists have started to take advantage of the modern medical apparatus and have been spearheading the field of neurolinguistics for several decades now. Unlike cognitive linguistics which is limited in dealing with human linguistic cognition to linguistic evidence alone, neurolinguistics attempts to peek at the actual physical processes going on in our brains (rather than in our elusive minds) while processing language (for more see Caplan (1987); Ahlsén (2006); or Ingram (2007)). Combining, quantifying, formalizing, and possibly computerizing the results of all these various attempts addressing different levels of meaning production and disambiguation and its background that will continue to come in, which will be more advanced and more revealing in the following decades (having here again great faith in the technological advances awaiting us in the near future), will hopefully give us enough rope to go sufficiently deep down the rabbit’s hole that is semantic meaning. Corpus-based semantics can and does provide us with the first length of it. <?page no="149"?> 149 Bibliography Ahlsén, Elisabeth (2006). Introduction to Neurolinguistics. Amsterdam: John Benjamins. Aitchison, Jean (1998). The Articulate Mammal: Introduction to Psycholinguistics. London: Routledge. Anderson, John (1983). “A Spreading Activation Theory of Memory.“ Journal of Verbal Learning and Verbal Behavior 22. 261-295. Approach to Lexical Discovery.” Workshop on Statistically-Based Natural- Language-Processing Techniques. Apresjan, Juri (1973). “Regular Polysemy.” Linguistics 142. 5-2. Artiles, Javier; Enrique, Amig´o and Gonzalo, Julio (2009). “The role of named entities in web people search. “ In: Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing (EMNLP '09), Vol. 2. Stroudsburg: Association for Computational Linguistics. 534-542. Atkins, Beryl T. Sue (1987). “Semantic ID Tags: Corpus Evidence for Dictionary Senses.” In: Proceedings of the Third Annual Conference of the UW Centre for the New Oxford English Dictionary. Waterloo: University of Waterloo. 17-36. Atwell, Eric (1986). Extracting a Natural Language Grammar from Raw Text - Research Report 208. School of Computer Studies: University of Leeds. Baroni, Marco and Lenci, Alessandro (2010). “Distributional Memory: A General Framework for Corpus-Based Semantics.” Computational Linguistics 36/ 4. 673-721. Barwise, Jon and Perry, John (1983). Situations and Attitudes. Cambridge: MIT-Bradford. Bateman, John; Kasper, Robert; Schütz, Jörg and Steiner, Erich (1989). “A New View on the Process of Translation.“ In: Proceedings of the fourth conference on European chapter of the Association for Computational Linguistics (EACL '89). Stroudsburg: Association for Computational Linguistics. 282-290. Bierwisch, Manfred (1987). “Linguistik als kognitive Wissenschaft: erläuterungen zu einem Forschungsprogramm.“ Zeitschrift für Germanistik 6. 645-667. Blutner, Reinhard (1998). “Lexical Pragmatics.” Journal of Semantics 15. 115- 162. Bolinger, Dwight (1968). “Entailment and the Meaning of Structures.” Glossa 2. 119-127. <?page no="150"?> 150 Bosch, Peter (1979). “Vagueness, Ambiguity, and All the Rest. An Explication and an Intuitive Test.“ In: Marc van de Velde and Willy Vandeweghe (eds.). Sprachstruktur, Individuum und Gesellschaft. Tübingen: Niemeyer. 9-19. Bréal, Michel (1897). Essai de sémantique: science des significations. Paris: Hachette. Brown, Peter; Della Pietra, Stephen; Della Pietra, Vincent and Mercer, Robert (1991). “Word Sense Disambiguation Using Statistical Methods.” In: Proceedings of the 29 th Annual Meeting of the Association for Computational Linguistics. Berkeley: Association for Computational Linguistics. 264- 270. Brown, Peter; Della Pietra, Vincent; deSouza, Peter; Lai, Jennifer and Mercer, Robert (1992). “Class-based N-Gram Models of Natural Language.” Computational Linguistics 18/ 4. 467-47. Brugman, Claudia (1988). The Story of Over: Polysemy, Semantics and the Structure of the Lexicon. New York: Garland. Brugman, Claudia and Lakoff, George (1988). “Cognitive Topology and Lexical Networks.” In: Steven Small, Garrison W. Cottrell and Michael Tanenhaus (eds.). Lexical Ambiguity Resolution. San Mateo: Morgan Kaufman. 477-508. Bybee, Joan (1985). Morphology: A Study of the Relation between Meaning and Form. Amsterdam: John Benjamins. Bybee, Joan (1998). “A Functionalist Approach to Grammar and its Evolution.” Evolution of Communication 2/ 2. 249-278. Caplan, David (1987). Neurolinguistics and Linguistic Aphasiology: An Introduction. Cambridge: Cambridge University Press. Charles, Walter and Miller, George (1989). “Contexts of Antonymous Adjectives.” Applied Psycholinguistics 3/ 10. 357-375. Chodorow, Martin; Byrd, Roy and Heidorn, George (1985). “Extracting Semantic Hierarchies from a Large On-Line Dictionary.” In: Proceedings of the 23rd Annual Meeting of the Association for Computational Linguistics. Chicago: University of Chicago. 299-304. Collier, Alex and Pacey, Mike (1997). “A Large-Scale Corpus System for Identifying Thesaural Relations.” In: Magnus Ljung (ed.). Corpus-Based Studies in English: Papers from the 17th International Conference on English Language Research on Computerized Corpora. Amsterdam: Rodopi. 87-100. Collins, Alan Quillian, Ross (1969). “Retrieval Time from Semantic Memory.“ Journal of Verbal Learning and Verbal Behavior 8. 240-247. <?page no="151"?> 151 Considine, John (2008). Dictionaries in Early Modern Europe: Lexicography and the Making of Heritage. Cambridge: Cambridge University Press. Coseriu, Eugenio (1962). Teoría del lenguaje y lingüística general: cinco estudios. Madrid: Gredos. Cottrell, Garrison and Small, Steven (1983). "A Connectionist Scheme for Modelling Word Sense Disambiguation." Cognition and Brain Theory 6. 89-120. Croft, William (2001). Radical Construction Grammar: Syntactic Theory in Typological Perspective. Oxford: Oxford University Press. Cruse, Alan (1986). Lexical semantics. Cambridge: Cambridge University Press. Dagan, Ido and Itai, Alon (1994). “Word Sense Disambiguation Using a Second Language Monolingual Corpus.” Computational Linguistics 20/ 4. 563-596. Dagan, Ido; Marcus, Shaul and Markovitch, Shaul (1993). “Contextual Word Similarity and Estimation from Sparse Data.” In: Proceedings of the 31st Annual Meeting of the Association for Computational Linguistics. Somerset: Association for Computational Linguistics. 164-171. Dagan, Ido; Peireira, Fernando and Lee, Lilian (1994). “Similarity-based Estimation of Word Co-occurrence Probabilities.” In: Proceedings of the 32nd Annual Meeting of Association for Computational Linguistics. Las Cruces: Association for Computational Linguistics. 272-278. Davies, Mark (2011). “The Corpus of Contemporary American English as the First Reliable Monitor Corpus of English.” Literary and Linguistic Computing 2. 447-465. Deane, Paul (1987). Semantic Theory and the Problem of Polysemy. Ph.D. Thesis at the Department of Linguistics, University of Chicago. Deane, Paul (1988). “Polysemy and Cognition.“ Lingua 75. 325-361. Deese, James (1962). “On the structure of associative meaning.” Psychological review 69. 161-175. Divjak, Dagmar and Gries, Stefan (2009). “Corpus-based Cognitive Semantics: A Contrastive Study of Phasal Verbs in English and Russian.“ In: Katarzyna Dziwirek and Barbara Lewandowska-Tomaszczyk (eds.). Studies in Cognitive Corpus Linguistics. Frankfurt am Main: Peter Lang. 273-296. Dobri , Nikola (2009a). “Korpusni pristup kao nova paradigma istraživanja jezika.” Philologia 6. 31-41. <?page no="152"?> 152 Dobri , Nikola (2009b). “Extracting Information from Large Digital Corpora - A Case Study in Quantitative Methods in Linguistics.” Primenjena lingvistika 10. 103-113. Dobri , Nikola (2010). “Word Sense Disambiguation Using ID Tags - Identifying Meaning in Polysemous Words in English.” In: Proceedings of the 29th International Conference on Lexis and Grammar. Belgrade: University of Belgrade. 97-105. Dobri , Nikola (2011). “The Cognitive Approach to Translating Metaphors Revisited - the Case of Pure, Clear and Clean vs. ist and Jasan.” Arbeiten aus Anglistik und Amerikanistik 36/ 2. 99-117. Dobri , Nikola (2012). “Language Corpora in the West Balkans - History, Current State and Future Perspective.” Slavisti na revija 60/ 4. 677-692. Du Marseis, César (1730). Des tropes ou Des diferens sens dans lesquels on peut prendre un méme langue. Paris: Brocas. Duffield, Cecily Jill; Hwang, Jena; Windisch Brown, Susan; Dligach, Dmitriy; Vieweg, Sarah; Davis, Jenny and Palmer, Martha (2007). “Criteria for the Manual Grouping of Verb Senses.” In: Proceedings of the Linguistic Annotation Workshop. Prague: Association for Computational Linguistic. 49-52. Dunbar, George (2001). “Towards A Cognitive Analysis of Polysemy, Ambiguity, and Vagueness.“ Cognitive Linguistics 12. 1-14. Edmonds, Philip and Kilgarriff, Adam (2002). “Introduction to the Special Issue on Evaluating Word Sense Disambiguation Systems.” Natural Language Engineering 8/ 4. 279-291. Evans, Vyvyan and Green, Melanie (2006). Cognitive Linguistics: An Introduction. Mahwah: Erlbaum. Falkum, Ingrid (2011). The Semantics and Pragmatics of Polysemy: A Relevance-Theoretic Account. Ph.D. Thesis at the University College London. Fellbaum, Christiane (2005). “WordNet and Wordnets.” In: Keith Brown et al. (eds.). Encyclopedia of Language and Linguistics, 2 nd Edition. Oxford: Elsevier. 665-670. Fillmore, Charles (1975). “An Alternative to Checklist Theories of Meaning.” In: Proceedings of the First Annual Meeting of the Berkeley Linguistics Society. Berkeley: Berkeley Linguistics Society. 123-131 Fillmore, Charles (1977). “Scenes-and-Frames Semantics.” In: Antonio Zampolli (ed.). Linguistic Structures Processing. 55-82. Fillmore, Charles (1978). “On The Organization of Semantic Information in the Lexicon.” In: Donka Frakas et al. (eds.). Papers from the Parasession on the Lexicon. Chicago: Chicago Linguistic Society. 148-173. <?page no="153"?> 153 Fillmore, Charles (1982). “Towards a Descriptive Framework for Spatial Deixis.” In: R. J. Jarvella and Wolfgang Klein, W. (eds.). Speech, Place, and Action: Studies of Deixis and Related Topics. Chichester: Wiley. 31-59. Fillmore, Charles (1985). “Frames and the Semantics of Understanding.” Quaderni di semantica 6. 222-254. Firth, John (1957). Papers in Linguistics. London: Oxford University Press. Fodor, Jerry (1998). Concepts: Where Cognitive Science Went Wrong. Oxford: Oxford University Press. Fontenelle, Thierry (1997). Turning a Bilingual Dictionary into a Lexical- Semantic Database. Tübingen: Niemeyer. Fries, Charles and Traver, Aileen (1940). English Word Lists: A Study of their Adaptability and Instruction. Washington: American Council of Education. Gale, William; Church, Kenneth and Yarowsky, David (1992). “Estimating Upper and Lower Bounds on the Performance of Word-Sense Disambiguation Programs.” In: Proceedings of 30th Meeting of the Association for Computational Linguistics. St. Andrews: The Association for Computational Linguistics. 249-256. Garside, Robert (1987). “A Hybrid Grammatical Tagger: CLAWS4.“ In: Robert Garside et al. (eds.). Corpus Annotation: Linguistic Information from Computer Text Corpora. London: Longman. 102-121. Geeraerts, Dirk (1988). “Cognitive Grammar and the History of Lexical Semantics.“ In: Brygida Rudzka-Ostyn (ed.). Topics in Cognitive Linguistics. Amsterdam: Benjamins. 647-77. Geeraerts, Dirk (1993). “Vagueness's Puzzles, Polysemy's Vagaries.” Cognitive Linguistics 4. 223-272. Geeraerts, Dirk (2010). Theories of Lexical Semantics. New York: Oxford University Press. Gilquin Gaëtanelle (2006). “The Place of Prototypicality in Corpus Linguistics. Causation in the Hot Seat.” In: Stefan Gries and Anatol Stefanowitsch (eds.). Corpora in Cognitive Linguistics: Corpus-based Approaches to Syntax and Lexis. Berlin, Heidelberg, New York: Mouton de Gruyter. 159-191. Gilquin Gaëtanelle (2008). “Taking a New Look at Lexical Networks.” Lexis 1. 23-39. Goldberg, Adele (1995). Constructions: A Construction Grammar Approach to Argument Structure. Chicago: University of Chicago Press. Goldberg, Adele (2006). Constructions at Work. The Nature of Generalization in Language. Oxford: Oxford University Press <?page no="154"?> 154 Greimas, Algirdas (1966). Sémantique structurale: recherche de méthode. Paris: Larousse. Grice, Paul (1975). “Logic and Conversation.” In: Peter Cole, and Jerry Morgan (eds.). Speech Acts. New York: Academic Press. 41-58. Gries, Stefan (2006). “Corpus-based Methods and Cognitive Semantics: The Many Senses of to Run.” In: Stefan Gries and Anatol Stefanowitsch (eds.). Corpora in Cognitive Linguistics: Corpus-based Approaches to Syntax and Lexis. Berlin, Heidelberg, New York: Mouton de Gruyter. 57-99. Gries, Stefan (2010). “Behavioral Profiles: A Fine-Grained and Quantitative Approach in Corpus-Based Lexical Semantics.” Mental Lexicon 3/ 5. 323- 346. Gries, Stefan and Otani, Naok (2010). “Behavioral profiles: A Corpus-based Perspective on Synonymy and Antonymy“. ICAME Journal 34. 121-150. Grishman, Ralph and Sterling, John (1993). “Smoothing of Automatically Generated Selectional Constraints.” In: Human Language Technology: Proceedings of the ARPA Workshop. San Francisco: Morgan Kaufmann. 254- 259. Grkovi -Mejdžor, Jasmina (2008). “O kognitivnim osnovama semanti ke promene.” In: Milorad Radovanovi and Predrag Piper (eds.). Semanti ka prou avanja srpskog jezika. Belgrade: SANU. 49-63. Guerberof Arenas, Ana (2010). “Exploring Machine Translation on the Web.” Tradumàtica: traducció i tecnologies de la informació i la comunicació 8. 1-6. Hanks, Patrick (1996). “Contextual Dependency and Lexical Sets.” International Journal of Corpus Linguistics 1/ 1. 75-98. Harris, Zellig (1954). “Distributional Structure.” Word 10. 146-162. Hayes, Philip (1977). Some Association-based Techniques for Lexical Disambiguation by Machine. Ph.D thesis at the Département de Mathématiques, Ecole Polytechnique Fédérale de Lausanne. Hearst, Marti (1991). “Noun Homograph Disambiguation Using Local Context in Large Corpora.” In: Proceedings of the 7th Annual Conference of the University of Waterloo Centre for the New OED and Text Research. Waterloo: University of Waterloo. 1-19. Hecht, Max (1888). Die griechische Bedeutungslehre: eine Aufgabe der klassischen Philologie. Leipzig: Teubner. Hopper, Paul (1987). “Emergent Grammar.” In: Proceedings in of the 13 th Berkeley Linguistics Conference. Berkeley: Berkeley Linguistics Society. 139-157. <?page no="155"?> 155 Hunston, Susan and Francis, Gill (1999). Pattern Grammar. A Corpus-driven Approach to the Lexical Grammar of English. Amsterdam & Philadelphia: John Benjamins. Ide, Nancy and Véronis, Jean (1998). “Introduction to the Special Issue on Word Sense Disambiguation: The State of the Art.” Computational Linguistics 24/ 1. 1-40. Ide, Nancy and Wilks, Yorick (2006). “Making Sense about Sense”. In: Eneko Agirre and Philip Edmonds (eds.). Word Sense Disambiguation: Algorithms and Applications. New York: Springer. 47-73. Ingram, John (2007). Neurolinguistics: An Introduction to Spoken Language Processing and its Disorders. Cambridge: Cambridge University Press. Jackendoff, Ray (1996). “Conceptual Semantics and Cognitive Linguistics. “ Cognitive Linguistics 7. 93-129. Kaplan, Abraham (1955). “An Experimental Study of Ambiguity and Context.” Mechanical Translation 2/ 2. 39-46. Katz, Jerrold and Fodor, Jerry (1963). “The Structure of a Semantic Theory.” Language 39. 170-210. Kay, Paul and Fillmore, Charles (1999). “Grammatical Constructions and Linguistic Generalizations: The ‘What’s X Doing Y? ’ Construction.” Language 75. 1-33. Kelly, Edward and Stone, Philip (1975). Computer Recognition of English Word Senses. Amsterdam: North-Holland. Kilgarriff, Adam (2006). “Word Senses.” In: Eneko Agirre and Philip Edmonds (eds.). Word Sense Disambiguation: Algorithms and Applications. New York: Springer. 29-46. Kilgarriff, Adam and Rosenzwieg, Joseph (2000). “English Senseval: Report and Results.” In: Proceedings of the 2nd Conference on Language Resources and Evaluation. Athens: ELDA. 1239-1244. Knight, Kevin and Luk, Steven (1994). “Building a Large-Scale Knowledge Base for Machine Translation.“ In: Proceedings of the National Conference on Aritificial Intelligence. Seattle: AAAI. 773-778. Kucera, Henri and Francis, Winthrop (1967). Computational Analysis of Present-Day American English. Providence: Brown University Press. Labov, William (1973). “The Boundaries of Words and their Meanings.” In: Charles-lanes Bailey and Roger Shuy (eds.). New Ways of Analysing Variation in English. Washington: Georgetown University Press. 340-371. Lakoff, George (1987). Women, Fire, and Dangerous Things. What Categories Reveal about the Mind. Chicago: University of Chicago Press. <?page no="156"?> 156 Langacker, Ronald (1987). Foundations of Cognitive Grammar 1: Theoretical Prerequisites. Stanford: Stanford University Press. Leacock, Claudia; Miller, George and Chodorow, Martin (1998). “Using Corpus Statistics and WordNet Relations for Sense Identification.” Computational Linguistics 24/ 1. 147-165. Leech, Geoffrey and Fligelstone, Steve (1992). “Computers and Corpus Analysis.” In: Christopher Butler (ed.). Computers and Written Texts. Oxford: Blackwell. 115-140. Leibniz, Wilhelm (1996[1725]). New Essays on Human Understanding. Peter Remnant and Jonathan Francis (eds.). Cambridge: Cambridge University Press. Lesk, Michael (1986). “Automated Sense Disambiguation Using Machine- Readable Dictionaries: How to Tell a Pine Cone from an Ice Cream Cone.” In: Proceedings of the 1986 SIGDOC Conference. Toronto: ACM Press. 24- 26. Lindström, Sten (1991). “Critical Study: Jon Barwise & John Perry, Situations and Attitudes.” Nous 5.743-770. Lipka, Leonhard (1972). Semantic Structure and Word-Formation: Verb- Particle Constructions in Contemporary English. Munich: Fink. Locke, John (1975[1689]). An essay concerning human understanding. Peter H. Nidditch (ed.). Oxford: Oxford University Press. Lyons, John (1963). Structural Semantics. Oxford: Blackwell. Lyons, John (1977). Semantics. Cambridge: Cambridge University Press MacLaury, Robert (1991). “Prototypes Revisited.” Annual Review of Anthropology 20. 55-74. Masterman, Margaret (1961). “Semantic Message Detection for Machine Translation, Using an Interlingua.” In: Proceedings of the International Conference on Machine Translation of Languages and Applied Language Analysis. London: Her Majesty’s Stationery Office. 437-475. Mel’ uk, Igor (1988). Dependency Syntax: Theory and Practice. Albany: SUNY Press. Mel’ uk, Igor (1995). “Phrasemes in Language and Phraseology in Linguistics.” In: Martin Everaert, Erik-Jan van der Linden, André Schenk and Rob Schreuder (eds.). Idioms: Structural and Psychological Perspectives. Hillsdale: Lawrence Erlbaum. 167-232. Mel’ uk, Igor (1996). “Lexical Functions: A Tool for the Description of Lexical Relations in a Lexicon. “ In: Leo Wanner (ed.). Lexical Functions in Lexicography and Natural Language Processing. Amsterdam: Benjamins. 37-102. <?page no="157"?> 157 Mel’ uk, Igor (1998). “Collocations and Lexical Functions.” In: Anthony Cowie (ed.). Phraseology. Theory, Analysis, and Applications. Oxford: Claredon Press. 23-53. Meyer, David and Schvaneveldt, Roger (1971). “Facilitation in Recognizing Pairs of Words: Evidence of a Dependence Between Retrieval Operations.” Journal of Experimental Psychology 90/ 2. 227-234. Miller, George; Beckwith, Richard; Fellbaum, Christiane; Gross, Derek and Miller, Katherine (1990). “WordNet: An On-line Lexical Database.” International Journal of Lexicography 4/ 3. 235-244. Navigli, Roberto (2009). “Word Sense Disambiguation: A Survey”. ACM Computing surveys 41/ 2. 1-69. Ng, Hwee Tou (1997). “Exemplar-Based Word Sense Disambiguation: Some Recent Improvements.” In: Proceedings of the 2nd Conference on Empirical Methods in Natural Language Processing. Heidelberg: Spektrum Akademischer Verlag. 208-213. Norvig, Peter and Lakoff, George (1987). “Taking: A Study in Lexical Network Theory.” In: Aske Jon, Natasha Beery, Laura Michaelis and Hana Filip (eds.). Proceedings of the Thirteenth Annual Meeting of the Berkeley Linguistics Society. Berkeley: BLS. 195-206. Nunberg, Geoffrey (1979). “The Non-Uniqueness of Semantic Solutions: Polysemy.” Linguistics and Philosophy 2. 143-184. Onysko, Alexander (2011). “Boundaries of language? A glance from a cognitive semantics point of view.” In: Holzer, Peter, Manfred Kienpointner, Julia Pröll and Ulla Ratheiser (eds.). An den Grenzen der Sprache. Innsbruck: Innsbruck University Press. 237-254. Palmer, Martha; Ng, Hwee Tou and Dang, Hoa Trang (2006). “Evaluation of WSD Systems.” In: Eneko Agirre and Philip Edmonds (eds.). Word Sense Disambiguation: Algorithms and Applications. New York: Springer. 75-106. Paul, Hermann (1920 [1880]). Prinzipien der Sprachgeschichte. Halle: Niemeyer. Pethö, Gergely (1999). “Die Behandlung der Polysemie in der Zwei- Ebenen-Semantik und den prototypentheoretischen Semantiken.“ Sprachtheorie und Germanistische Linguistik 9/ 1. 19-57. Porzig, Walter (1934). “Wesenhafte Bedeutungsbeziehungen.“ Beiträge zur Geschichte der deutschen Sprache und Literatur 58. 70-97. Pottier, Bernard (1964). “Vers une sémantique moderne.“ Travaux de linguistique et de littérature 2. 107-137. <?page no="158"?> 158 Pottier, Bernhard (1965). “La définition sémantique dans les dictionnaires.” Travaux de linguistique et de littérature 3. 33-39 Pustejovsky, James (1995). The Generative Lexicon. Cambridge: MIT Press Pustejovsky, James (1998). The Generative Lexicon. Cambridge: MIT Press. Pustejovsky, James (2006). “Type Theory and Lexical Decomposition.” Journal of Cognitive Science 7/ 1. 39-76. Pustejovsky, James; Bogureav, Branimir and Johnston, Michael (1995). “A Core Lexical Engine: the Contextual Determination of Word Sense.” Technical Report. Department of Computer Sciences, Brandeis University. Ramos, Margarita Alonso; Tutin, Agnes and Lapalme, Guy (1995). “Lexical Functions of the Explanatory Combinatorial Dictionary for Lexicalization in Text Generation.“ In: Patrick Saint-Dizier and Evelyne Viegas (eds.). Computational Lexical Semantics. Cambridge: Cambridge University Press. 351-366. Reifler, Erwin (1955). “The Mechanical Determination of Meaning.” In: Willian Locke and Andrew Booth (eds.). Machine Translation of Languages. New York: John Wiley & Sons. 136-164. Renouf, Antoinette (1996). “The ACRONYM Project: Discovering the Textual Thesaurus.” In: Carol Percy, Charles Meyer and Ian Lancashire (eds.). Synchronic Corpus Linguistics. Amsterdam & Atlanta: Rodopi. 171-187. Resnik, Philip (1992). “WordNet and Distributional Analysis: Class-Based Rosch, Eleanor and Mervis, Carolyn (1975). “Family Resemblances: Studies in the Internal Structure of Categories.” Cognitive Psychology 7. 573-605. Rubenstein, Herbert and Goodenough, John (1965). “Contextual Correlates of Synonymy.” Communications of the ACM 10/ 8. 627-633. Ruhl, Charles (1989). On Monosemy: A Study in Linguistic Semantics. Albany: State University of New York Press. Sandra, Dominiek (1998). “What Linguists Can and Can’t Tell you About the Human Mind: A Reply to Croft.” Cognitive Linguistics 9/ 4. 361-378. Sandra, Dominiek and Rice, Sally (1995). “Network Analyses of Prepositional Meaning: Mirroring Whose Mind - The Linguist’s or the Language User’s? ” Cognitive Linguistics 6/ 1. 89-130. Saussure, Ferdinand de (1916). Cours de linguistique générale. Charles Bally and Albert Séchehaye (eds.). Lausanne/ Paris: Payot. Schmid, Hans-Jörg (2000). English Abstract Nouns as Conceptual Shells. From Corpus to Cognition. Berlin/ New York: Mouton de Gruyter. <?page no="159"?> 159 Schütze, Heinrich (1993). “Word Space.” In: Stephen Hanson, Jack Cowan and Lee Giles (eds.). Advances in Neural Information Processing Systems 5. San Mateo: Morgan Kauffman. 895- 902. Schütze, Heinrich (1998). “Automatic Word Sense Discrimination.” Computational Linguistics 24/ 1. 97-123. Searle, John (1969). Speech Acts. Cambridge: Cambridge University Press. Sinclair, John (1991). Corpus, Concordance, Collocation. Oxford: Oxford University Press. Sperber, Dan and Wilson, Deirdre (1995). Postface to the Second Edition of Relevance: Communication and Cognition. Oxford: Blackwell. Stefanowitsch, Anatol and Stefan, Gries (2003). “Collostructions: Investigating the Interaction Between Words and Constructions.” International Journal of Corpus Linguistics 2/ 8. 209-243. Steyvers , Mark and Tenenbaum, Joshua (2005). “The Large-Scale Structure of Semantic Networks: Statistical Analyses and a Model of Semantic Growth.” Cognitive Science 29. 41-78. Stone, Philip (1969). “Improved Quality of Content Analysis Categories: Computerized Disambiguation Rules for High-Frequency English Words.” In: George Gerbner, Ole Holsti, Klaus Krippendorf, William Paisley and Philip Stone (eds.). The Analysis of Communication Content. New York: John Wiley and Sons. 199-221. Teubert, Wolfgang (1996). “Editorial.” International Journal of Corpus Linguistics 1/ 1. iii-x. Teubert, Wolfgang (2005). “My Version of Corpus Linguistcis.“ International Journal of Corpus Linguistics 10/ 1. 1-13. Teubert, Wolfgang (2010). Meaning, Discourse and Society. Cambridge: Cambridge University Press. Trier, Jost (1931[1973]). “Über Wort- und Begriffsfelde.“ In: Lthar Schmidt [ed.]. Wortfeldforschung. Zur Geschichte und Theorie des sprachlichen Feldes. Darmstadt: Wissenschaftliche Buchgesellscaf. 1-38. Tsohatzidis, Savas (ed.). (1990). Meanings and prototypes: Studies in linguistic categorization. London: Routledge. Tuggy, David (1993). “Ambiguity, Polysemy, and Vagueness.” Cognitive Linguistics 4/ 3. 273-290. Tyler, Andrea and Evans, Vyvyan (2001). “Reconsidering Prepositional Polysemy Networks: The Case of Over.” Language 4/ 77. 724-765. Tyler, Andrea and Evans, Vyvyan (2003). The Semantics of English Prepositions: Spatial Scenes, Embodied Meaning and Cognition. Cambridge: Cambridge University Press. <?page no="160"?> 160 Ullmann, Stephen (1975[1952]). Précis de sémantique francaise. Bern: A. Francke Verlag. Voorhees, Ellen; Leacock, Claudia and Towell, Geoffrey (1995). “Learning Context to Disambiguate Word Senses.” In: Thomas Petsche, Stephen José Hanson and Shavlik, Jude (eds.). Computational Learning Theory and Natural Learning Systems. Cambridge, Mass.: MIT Press. Waltz, David and Pollack, Jordan (1985). “Massively Parallel Parsing: A Strongly Interactive Model of Natural Language Interpretation.” Cognitive Science 9. 51-74. Weaver, Warren (1949[1955]). “Translation.” Mimeographed. 12. [Reprinted in Locke, William and Booth, Donald. (1955). (eds.). Machine Translation of Languages. New York: John Wiley & Sons. 15-23.] Weisgerber, Leo (1927). “Die Bedeutungslehre-ein Irrweg der Sprachwissenschaft? “ Germanisch-Romanische Monatsschrift 15. 161-183. Weiss, Stephen (1973). “Learning to Disambiguate.“ Information Storage and Retrieval 9. 33-41. Wierzbicka, Anna (1985). Lexicography and Conceptual Analysis. Ann Arbor: Karoma. Wilks, Yorick; Fass, Dan; Guo, Cheng-Ming; MacDonald, James; Plate, Tony and Slator, Brian (1990). “Providing Machine Tractable Dictionary Tools.” In: James Pustejovsky (ed.). Semantics and the Lexicon. Cambridge. Mass.: MIT Press. Yarowsky, David (1992). “Word Sense Disambiguation Using Statistical Models of Roget's Categories Trained on Large Corpora.” In: Proceedings of the 14th International Conference on Computational Linguistics (COLING'92). Nantes: International Committee on Computational Linguistics. 454-460. Zipf, George (1935). The Psycho-Biology of Language: An Introduction to Dynamic Biology. Cambridge. Mass.: MIT Press. Sources British National Corpus (BNC) - http: / / www.natcorp.ox.ac.uk/ . Bing Translator - http: / / www.microsofttranslator.com/ . Brown Corpus - http: / / icame.uib.no/ brown/ bcm.html. CASIS (Computer Aided Sense Recognition System) - http: / / casis.uniklu.ac.at/ users/ login. CASIS Corpus - http: / / casis.uni-klu.ac.at/ casis/ users/ welcome. <?page no="161"?> 161 CASIS Example Corpus - http: / / casis.uni-klu.ac.at/ casis_example/ . CoreLex - http: / / www.cs.brandeis.edu/ ~paulb/ CoreLex/ corelex.html. Corpus of Contemporary American English (COCA) - http: / / www.americancorpus.org/ . Google Translate - http: / / translate.google.com/ . New Oxford Dictionary of English 2001. Oxford: Oxford University Press. Oxford English Dictionary, 2nd Edition 1989. Oxford: Oxford University Press. RANDOM.ORG - http: / / www.random.org. SPSS Version 20 - http: / / www-01.ibm.com/ software/ analytics/ spss/ . The Oxford Dictionary of Idioms 2000. Oxford: Oxford University Press. Yahoo! Babelfish (now replaced by Bing Translator) - http: / / www.microsofttranslator.com/ . WordNet - http: / / wordnet.princeton.edu/ . <?page no="162"?> 162 Appendices Appendix 1 The list of all senses with the corresponding code used in the book. CODE SENSES 1 direct your gaze towards someone or something or in a specified direction 2 for a structure or part of body to have a view or outlook in a specified direction 3 express or show something to someone 4 ignore someone or something by pretending not to notice them 5 inspect something briefly 6 peruse a book or other written material 7 move round in order to investigate something 8 think of or regard in a particular manner 9 examine and consider 10 investigate in great detail 11 attempt to find 12 evaluate someone or something with a quick glance 13 have the appearance of 14 show likelihood of 15 appear your usual self 16 rely on someone or something 17 expect (hope) to do something 18 have the appearance of being as old as you are 19 express a perceived air of superiority 20 view someone with superiority 21 observe someone without showing embarrassment or fear 22 ignore wrongdoing 23 make future plans 24 evaluate someone carefully 25 take care of someone or something 26 reminisce past events 27 suffer a setback 28 eagerly await something or someone 29 pay a short visit 30 observe an event without getting involved 31 quickly take notice 32 bring an improvement to a situation 33 pay a social visit while going somewhere else while going somewhere else 34 search for and find particular information in a piece of writing 35 have respect for someone 36 have a downcast or mournful look 37 express emotion (anger) by look or glance (look daggers at) 38 suggest to someone to be quick at doing something (look alive; look lively; look sharp) 39 do not act before considering the possible consequence (look before you leap) 40 appear weak or unimportant (look small) 41 search for and produce something (look something out) 42 question the quality of a gift or a favor received (look a gift horse in the mouth) <?page no="163"?> 163 Appendix 2 List of the raw frequencies of occurrences of all of the predefined features per sense. SENSES 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 FEATURES PresentSimple 1104 5 8 3 30 452 24 24 309 636 PresentSimple3rd 780 6 8 6 48 15 53 4 42 1 785 165 2 PresentProgressive 271 1 1 6 66 39 27 444 29 PastSimple 1829 10 1 8 6 93 73 5 24 77 1092 PastProgressive 165 1 7 2 6 6 4 178 15 Pes.Perfect 41 18 6 6 7 PresentPerfectProgressive 15 1 1 2 1 4 25 PastPerfect 37 9 4 9 PastPerfectProgressive 2 10 Will+inf 38 1 19 9 5 11 35 WillProgressive 1 1 9 WillPerfect WillPerfectProgressive Goingto 21 5 3 10 12 Modalpresent(verb) 157 1 1 1 1 54 16 9 41 74 Modalpast(verb) 46 1 12 7 3 20 39 PresentParticiple 293 2 1 4 17 29 1 7 137 121 PastParticiple 478 1 8 13 42 60 14 403 92 To-infinitive 464 1 5 5 92 19 38 134 99 Aspect: simple 5287 23 1 33 31 194 839 233 155 1520 1 2945 165 2 Aspect: progressive 433 2 1 1 6 23 50 7 15 336 94 Aspect: perfect 21 5 3 4 6 Aspect: perfectprogressive 1 Voice: active 5728 25 1 1 34 37 217 875 240 171 1857 1 3044 165 2 Voice: passive 14 19 2 3 1 Intransitive 605 3 5 99 11 3 1 11 23 Transitive 5044 22 1 1 29 37 114 877 235 171 1829 1 87 8 Complextransitive 65 3 6 1 16 Linkingverb 28 1 2 4 2935 157 2 Declarative 5094 25 1 1 32 36 210 772 235 163 1607 1 2958 164 2 Interrogative 72 2 7 2 3 71 75 1 Iimperative 576 1 7 115 3 7 182 12 Mainclause 3795 10 1 1 16 21 138 453 156 80 948 1700 130 2 Subordinateclause 1941 13 18 16 77 441 83 93 909 1 1303 34 Relative sub. clause 6 2 2 1 3 42 1 Subject: human 4656 9 1 1 32 33 185 638 173 117 1396 1364 7 2 Subject: animate 4682 9 1 1 32 33 185 636 173 117 1403 1402 7 2 Subject: inanimate 124 14 1 3 74 37 37 99 1 1123 12 Subject: concrete 4773 22 1 1 33 33 187 671 204 140 1446 1 2290 18 2 Subject: countable 4784 22 1 1 33 33 187 693 204 147 1471 1 2326 20 2 Subject: uncountable 19 1 1 15 6 6 30 201 3 Subject: abstract 33 1 1 37 6 13 54 235 1 Subject: proper 930 4 1 8 7 58 32 33 29 170 237 7 1 Subject: machine 1 <?page no="164"?> 164 Subject: location 4 1 5 13 Subject: quantity 50 9 1 2 34 34 Subject: possessive 1 1 1 5 8 1 Subject: expletive 11 2 1 6 306 108 Subject: simple 4655 21 1 1 32 32 184 679 198 139 1452 1 2265 14 1 Subject: compound 42 9 2 5 16 76 3 1 Subject: phrase 116 1 1 1 4 24 11 18 53 184 5 Subject: clause 5 1 1 21 1 Subject: singular 3924 17 1 1 29 27 168 422 106 94 882 1 1721 17 2 Subject: plural 858 5 4 6 18 270 98 53 592 607 1 Subject: pronoun 2940 4 1 17 22 92 441 114 58 885 987 5 1 Head: human 5 Head: animate 5 Head: inanimate 19 Head: concrete 19 Head: countable 13 Head: uncountable 10 Head: abstract 5 Head: proper 1 Head: machine Head: location 3 Head: quantity Head: possessive Head: expletive 2 Head: simple 5 Head: compound 1 Head: phrase 19 Head: clause Head: singular 14 Head: plural Head: pronoun Object: human 2109 1 7 20 65 18 5 384 100 2 Object: animate 2146 1 8 1 20 68 21 5 411 118 2 Object: inanimate 2585 22 1 17 35 72 604 160 126 1231 1 190 1 Object: concrete 4444 22 1 1 23 35 89 217 98 55 1093 1 257 3 Object: countable 4545 22 1 1 24 36 87 447 143 108 1333 1 253 2 Object: uncountable 183 1 5 214 37 22 299 54 1 Object: abstract 285 2 1 3 450 83 74 544 52 Object: proper 472 2 5 47 12 6 100 21 Object: machine 3 1 Object: location 55 1 2 7 4 1 10 7 Object: quantity 77 4 18 5 1 27 2 Object: possessive 15 1 1 5 5 1 1 Object: expletive 111 3 69 9 19 12 1 Object: simple 4099 14 1 20 26 83 433 154 108 1379 157 3 Object: compound 176 4 1 2 3 63 6 9 96 19 Object: phrase 484 4 1 4 8 8 179 31 24 325 1 133 Object: clause 120 1 3 105 36 15 61 74 7 Object: singular 3597 15 1 1 16 17 76 217 69 59 820 1 212 2 Object: plural 947 7 8 19 11 236 70 47 516 43 Object: pronoun 1423 1 6 4 13 56 9 10 149 19 2 <?page no="165"?> 165 Complement: human 10 1 334 Complement: animate 11 1 381 Complement: inanimate 16 2 1 12 593 7 Complement: concrete 23 2 1 11 860 5 Complement: countable 23 2 1 10 882 4 Complement: uncountable 4 1 2 91 2 Complement: abstract 4 1 1 115 3 Complement: proper 4 2 77 Complement: machine 1 Complement: location 1 4 5 Complement: quantity 1 4 Complement: possessive 8 Complement: expletive 1 1 21 1 Complement: simple 18 2 1 6 607 2 Complement: compound 1 3 137 2 Complement: phrase 9 1 3 376 6 1 Complement: clause 11 5 1 1 4 321 134 1 Complement: singular 17 2 1 9 676 4 Complement: plural 4 1 201 Complement: pronoun 8 1 5 32 1 SENSES 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 FEATURES PresentSimple 10 1 5 5 8 44 1 44 6 13 PresentSimple3rd 1 3 7 4 1 2 PresentProgressive 1 11 3 1 31 2 PastSimple 23 3 4 11 10 2 6 11 7 2 PastProgressive 1 1 1 3 2 9 1 Pes.Perfect 1 2 1 PresentPerfectProgressive 1 1 1 PastPerfect 1 1 1 PastPerfectProgressive 2 2 Will+inf 1 3 5 1 1 WillProgressive 1 WillPerfect WillPerfectProgressive Goingto 4 3 Modalpresent(verb) 1 1 2 3 11 5 1 7 2 Modalpast(verb) 5 3 PresentParticiple 7 10 13 1 2 1 2 PastParticiple 3 5 9 20 7 2 2 To-infinitive 2 1 4 26 1 2 12 1 3 Aspect: simple 1 10 2 36 1 36 5 89 96 4 11 97 1 19 24 Aspect: progressive 8 2 1 16 14 32 1 5 Aspect: perfect Aspect: perfectprogressive Voice: active 1 18 2 38 1 37 5 104 110 4 7 129 1 20 29 Voice: passive 1 4 Intransitive 6 2 5 51 3 6 16 10 Transitive 1 12 2 9 1 35 5 100 58 1 11 119 1 3 19 <?page no="166"?> 166 Complextransitive 28 1 3 Linkingverb 1 1 1 Declarative 1 16 1 33 1 37 5 100 106 4 11 126 1 20 17 Interrogative 2 5 1 1 Iimperative 2 1 3 3 2 12 Mainclause 1 9 2 23 23 4 48 42 1 4 88 1 9 26 Subordinateclause 9 15 1 14 1 56 67 3 7 41 11 3 Relative sub. clause 1 1 Subject: human 1 11 1 36 1 26 5 79 83 2 8 113 1 18 16 Subject: animate 1 11 1 36 1 26 5 80 83 2 8 113 1 18 16 Subject: inanimate 7 10 5 3 4 1 Subject: concrete 1 11 1 36 1 31 5 90 85 2 9 116 1 19 16 Subject: countable 1 11 1 36 1 31 5 90 84 2 9 116 1 18 16 Subject: uncountable 2 3 2 1 Subject: abstract 2 2 2 1 Subject: proper 1 1 4 3 1 17 7 1 3 1 6 2 Subject: machine Subject: location Subject: quantity 1 1 1 Subject: possessive Subject: expletive 2 2 1 Subject: simple 1 11 1 36 1 32 5 87 83 2 8 115 1 16 15 Subject: compound 2 1 1 1 Subject: phrase 1 2 4 2 2 3 1 Subject: clause Subject: singular 1 6 35 12 4 64 55 2 3 73 1 11 10 Subject: plural 5 1 1 1 19 1 26 29 6 43 8 6 Subject: pronoun 6 28 14 4 46 45 1 2 76 3 9 Head: human Head: animate Head: inanimate Head: concrete Head: countable Head: uncountable Head: abstract Head: proper Head: machine Head: location Head: quantity Head: possessive Head: expletive Head: simple Head: compound Head: phrase Head: clause 1 Head: singular Head: plural Head: pronoun Object: human 2 25 2 3 55 3 4 9 15 Object: animate 2 25 2 3 57 3 4 9 15 Object: inanimate 1 12 13 29 2 36 45 1 2 62 1 1 2 <?page no="167"?> 167 Object: concrete 1 2 2 38 4 5 77 14 5 30 1 1 16 Object: countable 1 6 2 38 6 5 86 30 6 53 1 1 16 Object: uncountable 6 25 7 17 1 16 1 Object: abstract 10 27 16 33 1 1 39 1 Object: proper 6 1 6 5 2 Object: machine Object: location 1 Object: quantity 1 1 Object: possessive Object: expletive 3 7 5 1 Object: simple 5 2 36 29 5 87 27 1 5 58 1 14 Object: compound 2 1 3 1 Object: phrase 1 5 2 4 6 18 2 14 1 3 Object: clause 3 2 1 32 Object: singular 3 2 33 5 5 53 16 2 40 1 8 Object: plural 1 3 5 1 33 13 4 13 1 8 Object: pronoun 1 9 1 3 34 6 1 14 12 Complement: human 1 3 Complement: animate 1 3 Complement: inanimate 5 1 1 Complement: concrete 5 1 4 Complement: countable 5 2 4 Complement: uncountable Complement: abstract 1 Complement: proper Complement: machine Complement: location Complement: quantity Complement: possessive Complement: expletive Complement: simple 4 1 3 Complement: compound Complement: phrase 1 1 1 Complement: clause 3 Complement: singular 5 1 3 Complement: plural 1 Complement: pronoun 2 4 SENSES 31 32 33 34 35 FEATURES PresentSimple 5 5 PresentSimple3rd PresentProgressive PastSimple 2 11 3 PastProgressive 2 1 2 Pes.Perfect 1 PresentPerfectProgressive PastPerfect 2 PastPerfectProgressive Will+inf <?page no="168"?> 168 WillProgressive WillPerfect WillPerfectProgressive Goingto 1 Modalpresent(verb) 3 3 Modalpast(verb) 1 PresentParticiple PastParticiple 1 To-infinitive 3 3 Aspect: simple 2 26 16 Aspect: progressive 2 1 2 Aspect: perfect Aspect: perfectprogressive Voice: active 2 2 27 16 2 Voice: passive Intransitive 2 2 Transitive 2 22 16 Complextransitive 5 Linkingverb Declarative 2 2 23 16 2 Interrogative 1 Iimperative 3 Mainclause 2 1 15 6 2 Subordinateclause 1 12 10 Relative sub. clause Subject: human 1 25 15 Subject: animate 1 25 15 Subject: inanimate 1 1 Subject: concrete 1 1 25 15 1 Subject: countable 1 1 25 15 1 Subject: uncountable Subject: abstract Subject: proper 1 1 1 Subject: machine Subject: location Subject: quantity 1 Subject: possessive Subject: expletive Subject: simple 1 1 24 15 1 Subject: compound Subject: phrase 1 Subject: clause Subject: singular 1 16 9 Subject: plural 1 9 6 1 Subject: pronoun 16 8 Head: human Head: animate Head: inanimate Head: concrete Head: countable Head: uncountable <?page no="169"?> 169 Head: abstract Head: proper Head: machine Head: location Head: quantity Head: possessive Head: expletive Head: simple Head: compound Head: phrase Head: clause Head: singular Head: plural Head: pronoun Object: human 1 14 Object: animate 1 1 14 Object: inanimate 1 1 25 1 Object: concrete 1 1 22 14 1 Object: countable 1 2 24 14 1 Object: uncountable 2 Object: abstract 1 4 Object: proper 1 1 Object: machine Object: location Object: quantity Object: possessive 1 Object: expletive 1 Object: simple 1 2 23 12 1 Object: compound 3 2 Object: phrase Object: clause Object: singular 1 1 14 7 1 Object: plural 1 10 7 Object: pronoun 1 1 9 Complement: human Complement: animate Complement: inanimate 1 Complement: concrete 1 Complement: countable 1 Complement: uncountable Complement: abstract Complement: proper Complement: machine Complement: location 1 Complement: quantity Complement: possessive Complement: expletive Complement: simple 1 Complement: compound Complement: phrase <?page no="170"?> 170 Complement: clause Complement: singular 1 Complement: plural Complement: pronoun 1 <?page no="171"?> 171 Appendix 3 List of the raw frequencies of occurrences of all of the nonpredefined features per sense. SENSES 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 FEATURES L1: after 15 1 2 3 2 1 2 1 L1: and 180 1 2 14 9 5 3 1 76 5 L1: at 3494 2 16 868 293 3 10 30 2 L1: but 28 1 3 12 2 7 36 5 L1: by 31 3 6 9 1 10 1 L1: for 16 11 3 3 1 1792 1 8 L1: forward 2 2 1 1 L1: he 83 7 1 2 4 49 L1: how 20 8 1 2 4 49 1 L1: in 146 7 2 16 5 7 L1: it 4 3 178 84 L1: just 29 7 1 1 6 1 L1: like 7 1 1378 133 L1: she 52 1 3 2 2 37 1 L1: that 122 9 1 48 19 8 56 326 9 L1: to 144 3 1 1 13 5 4 4 27 2 L1: up 346 1 2 4 2 L1: what 40 1 29 4 75 227 1 L1: when 188 1 2 1 3 108 7 10 32 8 2 L1: which 16 1 2 3 1 4 40 L1: who 94 4 2 24 10 12 48 184 4 L2: at 3494 2 16 868 293 3 10 30 2 L2: it 4 3 178 84 L2: out 127 17 2 1 2 L2: that 122 9 1 48 19 8 56 326 9 R1: after 15 1 2 3 2 1 2 1 R1: and 180 1 2 14 9 5 3 1 76 5 R1: around 20 207 1 10 4 R1: as 92 1 2 1 7 17 6 1 19 110 8 R1: at 3494 2 16 868 293 3 10 30 2 R1: away 64 1 R1: back 99 2 2 1 R1: down 197 2 1 1 1 1 R1: for 16 11 3 3 1 1792 1 8 R1: forward 2 2 1 1 R1: from 59 1 1 1 R1: if 147 2 1 1 5 176 20 9 57 48 1 R1: in 152 7 2 16 5 7 R1: into 136 3 1 2 152 1 3 1 R1: it 4 3 178 84 R1: like 7 1 1382 133 R1: on 34 4 6 1 18 1 3 7 R1: out 136 17 2 1 2 R1: over 126 2 26 1 1 1 2 <?page no="172"?> 172 R1: that 123 9 1 48 19 8 56 326 9 R1: through 50 1 38 2 1 3 R1: to 145 3 1 1 13 5 4 4 38 2 R1: up 357 1 2 4 2 R1: what 40 1 29 4 75 227 1 R2: around 18 204 1 10 4 R2: as 92 1 2 1 7 17 6 1 19 92 8 R2: at 3494 2 16 868 293 3 10 30 2 R2: for 16 11 3 3 1 1792 1 8 R2: from 51 1 1 1 R2: how 20 8 1 2 4 49 1 R2: if 147 2 1 1 5 176 20 9 57 48 1 R2: in 148 7 2 16 5 7 R2: into 136 3 1 2 152 1 3 1 R2: like 7 1 1378 133 R2: on 34 4 6 1 18 1 3 7 R2: over 122 2 26 1 1 1 2 R2: through 49 1 38 1 1 2 R2: to 144 3 1 1 13 5 4 4 27 2 R2: up 346 1 2 4 2 R2: what 40 1 29 4 75 227 1 Sub: after 15 1 2 3 2 1 2 1 Sub: and 180 1 2 14 9 5 3 1 76 5 Sub: as 90 1 2 1 7 17 6 1 16 68 8 Sub: at 3494 2 16 868 293 3 10 30 2 Sub: but 56 2 6 24 4 14 72 10 Sub: for 16 11 3 3 1 1792 1 8 Sub: from 51 1 1 1 Sub: how 20 8 1 2 4 49 1 Sub: if 147 2 1 1 5 176 20 9 57 48 1 Sub: in 146 7 2 16 5 7 Sub: like 7 1 1378 133 Sub: that 122 9 1 48 19 8 56 326 9 Sub: then 40 1 3 1 3 2 Sub: through 49 1 38 1 1 2 Sub: to 144 3 1 1 13 5 4 4 27 2 Sub: what 41 1 29 4 75 227 1 Sub: when 188 1 2 1 3 108 7 10 32 8 2 Sub: which 16 1 2 3 1 4 40 Sub: while 16 2 4 2 10 6 Sub: who 47 2 1 12 5 6 24 92 2 SENSES 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 FEATURES L1: after 37 L1: and 2 2 L1: at 1 5 1 L1: but 2 1 1 1 2 L1: by L1: for 2 1 <?page no="173"?> 173 L1: forward 1 1 74 L1: he 3 L1: how 2 1 2 L1: in 10 2 1 1 L1: it L1: just L1: like L1: she 1 1 L1: that 8 1 2 2 1 5 L1: to 117 2 15 2 72 L1: up 1 L1: what 1 1 L1: when 1 1 1 18 L1: which 1 3 1 1 L1: who 8 8 2 4 4 L2: at 1 5 1 L2: it 4 3 178 84 L2: out 1 1 1 L2: that 8 1 2 2 1 5 R1: after 37 R1: and 2 2 R1: around 1 1 R1: as 2 1 3 1 3 1 4 R1: at 1 5 1 R1: away R1: back 1 1 97 6 R1: down 1 1 10 R1: for 2 1 R1: forward 1 1 103 R1: from R1: if 3 2 1 2 6 1 R1: in 30 2 1 1 R1: into 1 1 2 1 R1: it R1: like R1: on 1 10 2 18 R1: out 1 1 1 R1: over 1 R1: that 8 1 2 2 1 5 R1: through R1: to 117 2 15 2 72 R1: up 1 R1: what 1 1 R2: around 1 1 R2: as 2 1 3 1 3 1 4 R2: at 1 5 1 R2: for 2 1 R2: from R2: how 2 1 2 R2: if 3 2 1 2 6 1 <?page no="174"?> 174 R2: in 29 2 1 1 R2: into 1 3 2 1 R2: like R2: on 1 10 2 18 R2: over 1 R2: through R2: to 117 2 15 2 72 R2: up 1 R2: what 1 1 Sub: after 37 Sub: and 2 2 Sub: as 2 1 3 1 3 1 4 Sub: at 1 5 1 Sub: but 4 2 2 2 4 Sub: for 2 1 Sub: from Sub: how 2 1 2 Sub: if 3 2 1 2 6 1 Sub: in 10 2 1 1 Sub: like Sub: that 8 1 2 2 1 5 Sub: then 2 1 Sub: through Sub: to 117 2 15 2 72 Sub: what 1 1 Sub: when 1 1 1 18 Sub: which 1 1 Sub: while 2 1 1 2 Sub: who 4 4 1 2 2 SENSES 31 32 33 34 35 FEATURES L1: after L1: and L1: at L1: but L1: by 1 L1: for 10 L1: forward L1: he L1: how 1 L1: in L1: it L1: just L1: like L1: she L1: that 1 1 L1: to 1 10 L1: up 2 22 10 L1: what <?page no="175"?> 175 L1: when 1 L1: which L1: who 4 4 L2: at L2: it L2: out 22 L2: that 1 1 R1: after R1: and R1: around R1: as R1: at R1: away R1: back R1: down R1: for 10 R1: forward R1: from R1: if 1 R1: in R1: into R1: it R1: like R1: on 2 R1: out 26 R1: over R1: that 1 1 R1: through R1: to 1 10 R1: up 2 22 14 R1: what R2: around R2: as R2: at R2: for 10 R2: from R2: how 1 R2: if 1 R2: in R2: into R2: like R2: on 2 R2: over R2: through R2: to 1 10 R2: up 2 22 10 R2: what Sub: after Sub: and Sub: as <?page no="176"?> 176 Sub: at Sub: but Sub: for 10 Sub: from Sub: how 1 Sub: if 1 Sub: in Sub: like Sub: that 1 1 Sub: then Sub: through Sub: to 1 10 Sub: what Sub: when 1 Sub: which Sub: while Sub: who 2 2 <?page no="177"?> 177 Appendix 4 Sum of all token and type values of the attested contextual features per sense. SENSES Predef. feat. type Predef. feat. token Nonpredef. feat. type Nonpredef. feat. token Feat. total type Feat. total token direct your gaze towards someone or something or in a specified direction 90 95,101 86 24,213 176 119,314 for a structure [...] to have a view or outlook in a specified direction 37 391 46 144 83 535 express or show something to someone 20 20 3 3 23 23 ignore someone or something by pretending not to notice them 18 18 0 0 18 18 inspect something briefly 44 567 16 81 60 648 peruse a book or other written material 41 640 27 144 68 784 move round in order to investigate something 53 3,076 45 667 98 3,743 think of or regard in a particular manner 76 13,538 68 5,726 144 19,264 examine and consider 67 3,781 62 1,764 129 5,545 investigate in great detail 65 2,715 49 520 114 3,235 attempt to find 85 30,025 69 8,241 154 38,266 evaluatesomeoneorsomethingwithaquickglance 16 16 7 7 23 23 have the appearance of 100 41,226 81 9,932 181 51,158 show likelihood of 49 1,416 54 925 103 2,341 appear your usual self 17 28 1 1 18 29 rely on someone or something 62 2,053 36 551 98 ,2604 expect (hope) to do something 18 18 1 1 19 19 have the appearance of being as old as you are 32 236 19 30 51 266 express a perceived air of superiority 21 31 2 2 23 33 view someone with superiority 47 734 25 112 72 846 observe someone without showing embarrassment or fear 12 12 0 0 12 12 ignore wrongdoing 46 568 19 77 65 645 make future plans 25 96 5 5 30 101 evaluate someone carefully 55 1,784 25 152 80 1,936 take care of someone or something 64 1,482 40 245 104 1,727 reminisce past events 22 42 1 6 23 48 suffer a setback 36 165 11 22 47 187 eagerly await something or someone 68 1,981 39 523 107 2,504 pay a short visit 18 18 4 4 22 22 observe an event without getting involved 34 248 8 54 42 302 quickly take notice 42 393 15 97 57 490 bring an improvement to a situation 16 22 3 6 19 28 pay a social visit while going somewhere else 24 31 0 0 24 31 search for and find particular information in a piece of writing 50 469 12 79 62 548 have respect for someone 31 289 16 85 47 374 <?page no="178"?> 178 Appendix 5 List of the contextual features with the values of their distinctiveness. PRED. FEATURES Appears with senses Distinctiv. NON-PRED. FEATURES Appears with senses Distinctiv. Tense: Present Simple 23 1.52 L1: after 1 35 Tense: PresentSimple3rd 20 1.75 L1: and 1 35 Tense: PresentProgressive 16 2.19 L1: at 9 3.89 Tense: Past Simple 25 1.4 L1: but 1 35 Tense: Past Progressive 19 1.84 L1: by 3 11.67 Tense: Pes. Perfect 10 3.5 L1: for 1 35 Tense: Present Perfect Progressive 10 3.5 L1: forward 2 17.5 Tense: Past Perfect 8 4.38 L1: he 2 17.5 Tense: Past Perfect Progressive 4 8.75 L1: how 12 2.92 Tense: will + inf 13 2.69 L1: in 1 35 Tense: will Progressive 4 8.75 L1: it 1 35 Tense: will Perfect 0 0 L1: just 3 11.67 Tense: will Perfect Progressive 0 0 L1: like 1 35 Tense: going to 8 4.38 L1: she 1 35 Tense: modal present 22 1.59 L1: that 1 35 Tense: modal past (verb) 10 3.5 L1: to 1 35 Tense: Present Participle 18 1.94 L1: up 13 2.69 Tense: Past Participle 18 1.94 L1: what 8 4.38 Tense: to infinitive 21 1.67 L1: when 1 35 Aspect: simple 33 1.06 L1: which 1 35 Aspect: progressive 22 1.59 L1: who 1 35 Aspect: perfect 6 5.83 L2: at 1 35 Aspect: perfectprogressive 1 35 L2: it 12 2.92 Voice: active 35 1 L2: out 1 35 Voice: passive 7 5 L2: that 1 35 Syntactic features: intransitive 18 1.94 R1: after 1 35 Syntactic features: transitive 33 1.06 R1: and 1 35 Syntactic features: complex transitive 10 3.5 R1: around 1 35 Syntactic features: linking verb 10 3.5 R1: as 1 35 Syntactic features: declarative 35 1 R1: at 1 35 Syntactic features: interrogative 14 2.5 R1: away 2 17.5 Syntactic features: imperative 16 2.19 R1: back 1 35 Syntactic features: main 33 1.06 R1: down 11 3.18 Syntactic features: subordinate (with/ no 28 1.25 R1: for 7 5 <?page no="179"?> 179 subordinator) Syntactic features: relative subordinate 9 3.89 R1: forward 3 11.67 Subject: human 33 1.06 R1: from 1 35 Subject: animate 33 1.06 R1: if 1 35 Subject: inanimate 19 1.84 R1: in 1 35 Subject: concrete 35 1 R1: into 1 35 Subject: countable 35 1 R1: it 7 5 Subject: uncountable 14 2.5 R1: like 1 35 Subject: abstract 14 2.5 R1: on 11 3.18 Subject: proper 29 1.21 R1: out 10 3.5 Subject: machine 1 35 R1: over 1 35 Subject: location 4 8.75 R1: that 3 11.67 Subject: quantity 11 3.18 R1: through 4 8.75 Subject: possessive 6 5.83 R1: to 4 8.75 Subject: expletive 10 3.5 R1: up 6 5.83 Subject: simple 35 1 R1: what 1 35 Subject: compound 13 2.69 R2: around 4 8.75 Subject: phrase 20 1.75 R2: as 1 35 Subject: clause 5 7 R2: at 1 35 Subject: singular 32 1.09 R2: for 2 17.5 Subject: plural 27 1.3 R2: from 2 17.5 Subject: pronoun 27 1.3 R2: how 1 35 Head: human 1 35 R2: if 1 35 Head: animate 1 35 R2: in 1 35 Head: inanimate 1 35 R2: into 1 35 Head: concrete 1 35 R2: like 1 35 Head: countable 1 35 R2: on 4 8.75 Head: uncountable 1 35 R2: over 10 3.5 Head: abstract 1 35 R2: through 1 35 Head: proper 1 35 R2: to 1 35 Head: machine 0 0 R2: up 1 35 Head: location 1 35 R2: what 1 35 Head: quantity 0 0 Subordinator: after 1 35 Head: possessive 0 0 Subordinator: and 1 35 Head: expletive 1 35 Subordinator: as 2 17.5 Head: simple 1 35 Subordinator: at 3 11.67 Head: compound 1 35 Subordinator: but 1 35 Head: phrase 1 35 Subordinator: for 9 3.89 Head: clause 1 35 Subordinator: from 1 35 Head: singular 1 35 Subordinator: how 1 35 Head: plural 0 0 Subordinator: if 1 35 Head: pronoun 0 0 Subordinator: in 5 7 Object: human 22 1.59 Subordinator: like 3 11.67 Object: animate 24 1.46 Subordinator: that 1 35 Object: inanimate 30 1.17 Subordinator: then 3 11.67 Object: concrete 32 1.09 Subordinator: through 2 17.5 Object: countable 32 1.09 Subordinator: to 1 35 Object: uncountable 18 1.94 Subordinator: what 1 35 Object: abstract 20 1.75 Subordinator: when 1 35 <?page no="180"?> 180 Object: proper 16 2.19 Subordinator: which 1 35 Object: machine 2 17.5 Subordinator: while 17 2.06 Object: location 10 3.5 Subordinator: who 2 17.5 Object: quantity 10 3.5 Object: possessive 8 4.38 Object: expletive 12 2.92 Object: simple 29 1.21 Object: compound 17 2.06 Object: phrase 23 1.52 Object: clause 14 2.5 Object: singular 30 1.17 Object: plural 24 1.46 Object: pronoun 24 1.46 Complement: human 5 7.00 Complement: animate 5 7.00 Complement: inanimate 11 3.18 Complement: concrete 10 3.5 Complement: countable 10 3.5 Complement: uncountable 6 5.83 Complement: abstract 7 5 Complement: proper 3 11.67 Complement: machine 1 35 Complement: location 4 8.75 Complement: quantity 2 17.50 Complement: possessive 1 35 Complement: expletive 4 8.75 Complement: simple 11 3.18 Complement: compound 5 7 Complement: phrase 10 3.5 Complement: clause 10 3.5 Complement: singular 10 3.5 Complement: plural 4 8.75 Complement: pronoun 8 4.38 <?page no="181"?> 181 Appendix 6 List of the predictive powers of all of the predefined features. SENSES 1 2 3 4 5 6 7 FEATURES Present Simple 1.72E-02 7.78E-05 0.00E+00 0.00E+00 1.25E-04 4.70E-05 4.68E-04 Present Simple 3rd 2.00E-02 1.54E-04 0.00E+00 0.00E+00 2.06E-04 1.54E-04 1.23E-03 Present Progressive 1.79E-02 0.00E+00 0.00E+00 0.00E+00 6.63E-05 6.63E-05 3.98E-04 Past Simple 2.20E-02 1.20E-04 0.00E+00 1.20E-05 9.60E-05 7.20E-05 1.12E-03 Past Progressive 2.14E-02 0.00E+00 1.29E-04 0.00E+00 0.00E+00 9.07E-04 2.59E-04 Pes. Perfect 4.56E-02 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 Present Perfect Progressive 2.88E-02 0.00E+00 0.00E+00 0.00E+00 0.00E+00 1.92E-03 1.92E-03 Past Perfect 7.23E-02 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 Past Perfect Progressive 3.13E-02 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 Will + inf 2.23E-02 0.00E+00 0.00E+00 0.00E+00 5.87E-04 0.00E+00 0.00E+00 Will Progressive 2.08E-02 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 Will Perfect 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 Will Perfect Progressive 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 Going to 4.45E-02 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 Modal present (verb) 1.75E-02 1.11E-04 0.00E+00 0.00E+00 1.11E-04 1.11E-04 1.11E-04 Modal past (verb) 3.36E-02 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 7.30E-04 Present Participle 2.50E-02 1.71E-04 0.00E+00 0.00E+00 8.56E-05 3.42E-04 1.45E-03 Past Participle 2.29E-02 0.00E+00 0.00E+00 0.00E+00 4.78E-05 3.82E-04 6.22E-04 To-infinitive 2.40E-02 5.19E-05 0.00E+00 0.00E+00 2.59E-04 0.00E+00 2.59E-04 Aspect: simple 1.33E-02 5.79E-05 0.00E+00 2.42E-06 8.33E-05 7.82E-05 4.89E-04 Aspect: progressive 1.86E-02 8.59E-05 4.27E-05 0.00E+00 4.27E-05 2.58E-04 9.87E-04 Aspect: perfect 8.14E-02 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 Aspect: perfect progressive 1.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 Voice: active 1.25E-02 5.46E-05 2.29E-06 2.29E-06 7.43E-05 8.09E-05 4.74E-04 Voice: passive 4.55E-02 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 Intransitive 3.90E-02 1.93E-04 0.00E+00 0.00E+00 3.22E-04 0.00E+00 6.38E-03 Transitive 1.70E-02 7.42E-05 3.33E-06 3.33E-06 9.76E-05 1.25E-04 3.84E-04 Complex transitive 4.92E-02 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 2.27E-03 Linking verb 8.94E-04 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 3.20E-05 Declarative 1.22E-02 5.97E-05 2.29E-06 2.29E-06 7.66E-05 8.63E-05 5.03E-04 Interrogative 2.11E-02 0.00E+00 0.00E+00 0.00E+00 5.86E-04 0.00E+00 0.00E+00 Iimperative 3.84E-02 0.00E+00 0.00E+00 0.00E+00 0.00E+00 6.69E-05 4.67E-04 Main clause 1.47E-02 3.88E-05 3.94E-06 3.94E-06 6.18E-05 8.12E-05 5.34E-04 Subordinate clause 1.33E-02 8.89E-05 0.00E+00 0.00E+00 1.23E-04 1.09E-04 5.27E-04 Relative subordinate clause 1.13E-02 3.77E-03 0.00E+00 0.00E+00 0.00E+00 0.00E+00 3.77E-03 Subject: human 1.54E-02 2.97E-05 3.33E-06 3.33E-06 1.06E-04 1.09E-04 6.13E-04 Subject: animate 1.54E-02 2.97E-05 3.33E-06 3.33E-06 1.05E-04 1.08E-04 6.09E-04 Subject: inanimate 4.14E-03 4.67E-04 0.00E+00 0.00E+00 3.32E-05 0.00E+00 1.00E-04 Subject: concrete 1.31E-02 6.06E-05 2.86E-06 2.86E-06 9.09E-05 9.09E-05 5.15E-04 Subject: countable 1.30E-02 6.00E-05 2.86E-06 2.86E-06 9.00E-05 9.00E-05 5.09E-04 Subject: uncountable 4.57E-03 2.41E-04 0.00E+00 0.00E+00 0.00E+00 0.00E+00 2.41E-04 Subject: abstract 5.82E-03 1.76E-04 0.00E+00 0.00E+00 0.00E+00 0.00E+00 1.76E-04 Subject: proper 2.03E-02 8.72E-05 0.00E+00 2.17E-05 1.74E-04 1.53E-04 1.27E-03 Subject: machine 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 Subject: location 4.35E-02 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 Subject: quantity 3.20E-02 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 Subject: possessive 9.80E-03 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 Subject: expletive 2.48E-03 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 Subject: simple 1.30E-02 5.89E-05 2.86E-06 2.86E-06 8.94E-05 8.94E-05 5.15E-04 Subject: compound 1.93E-02 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 Subject: phrase 1.29E-02 1.11E-04 0.00E+00 0.00E+00 1.11E-04 1.11E-04 4.44E-04 Subject: clause 3.45E-02 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 <?page no="182"?> 182 Subject: singular 1.58E-02 6.84E-05 4.06E-06 4.06E-06 1.17E-04 1.09E-04 6.77E-04 Subject: plural 1.16E-02 6.78E-05 0.00E+00 0.00E+00 5.41E-05 8.11E-05 2.44E-04 Subject: pronoun 1.86E-02 2.52E-05 6.30E-06 0.00E+00 1.07E-04 1.39E-04 5.82E-04 Head: human 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 Head: animate 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 Head: inanimate 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 Head: concrete 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 Head: countable 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 Head: uncountable 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 Head: abstract 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 Head: proper 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 Head: machine 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 Head: location 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 Head: quantity 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 Head: possessive 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 Head: expletive 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 Head: simple 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 Head: compound 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 Head: phrase 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 Head: clause 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 Head: singular 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 Head: plural 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 Head: pronoun 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 Object: human 3.33E-02 0.00E+00 1.59E-05 0.00E+00 1.10E-04 0.00E+00 3.15E-04 Object: animate 3.00E-02 0.00E+00 1.42E-05 0.00E+00 1.12E-04 1.42E-05 2.80E-04 Object: inanimate 1.61E-02 1.37E-04 0.00E+00 6.33E-06 1.06E-04 2.18E-04 4.48E-04 Object: concrete 2.09E-02 1.03E-04 4.69E-06 4.69E-06 1.08E-04 1.65E-04 4.19E-04 Object: countable 1.93E-02 9.31E-05 4.38E-06 4.38E-06 1.02E-04 1.53E-04 3.69E-04 Object: uncountable 1.09E-02 0.00E+00 0.00E+00 0.00E+00 6.00E-05 0.00E+00 2.99E-04 Object: abstract 8.51E-03 0.00E+00 0.00E+00 0.00E+00 5.95E-05 3.00E-05 8.95E-05 Object: proper 4.14E-02 0.00E+00 0.00E+00 0.00E+00 0.00E+00 1.76E-04 4.38E-04 Object: machine 3.75E-01 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 Object: location 6.18E-02 0.00E+00 0.00E+00 0.00E+00 1.12E-03 0.00E+00 2.25E-03 Object: quantity 5.62E-02 0.00E+00 0.00E+00 0.00E+00 0.00E+00 2.92E-03 0.00E+00 Object: possessive 6.25E-02 0.00E+00 0.00E+00 0.00E+00 0.00E+00 4.17E-03 4.17E-03 Object: expletive 3.84E-02 0.00E+00 0.00E+00 0.00E+00 1.04E-03 0.00E+00 0.00E+00 Object: simple 2.06E-02 7.03E-05 5.17E-06 0.00E+00 1.00E-04 1.30E-04 4.16E-04 Object: compound 2.60E-02 5.91E-04 0.00E+00 0.00E+00 1.48E-04 2.96E-04 4.44E-04 Object: phrase 1.65E-02 1.36E-04 0.00E+00 3.39E-05 1.36E-04 2.72E-04 2.72E-04 Object: clause 1.86E-02 1.54E-04 0.00E+00 0.00E+00 0.00E+00 0.00E+00 4.64E-04 Object: singular 2.25E-02 9.37E-05 6.33E-06 6.33E-06 1.00E-04 1.06E-04 4.75E-04 Object: plural 1.93E-02 1.43E-04 0.00E+00 0.00E+00 1.63E-04 3.87E-04 2.24E-04 Object: pronoun 3.29E-02 0.00E+00 2.29E-05 0.00E+00 1.39E-04 9.25E-05 3.00E-04 Complement: human 5.73E-03 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 Complement: animate 5.54E-03 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 Complement: inanimate 2.26E-03 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 Complement: concrete 2.52E-03 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 Complement: countable 2.46E-03 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 Complement: uncountable 6.35E-03 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 Complement: abstract 4.40E-03 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 Complement: proper 1.61E-02 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 Complement: machine 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 Complement: location 2.27E-02 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 Complement: quantity 1.00E-01 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 Complement: possessive 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 Complement: expletive 1.04E-02 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 Complement: simple 2.53E-03 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 Complement: compound 1.39E-03 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 Complement: phrase 2.25E-03 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 <?page no="183"?> 183 Complement: clause 2.28E-03 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 Complement: singular 2.36E-03 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 Complement: plural 4.83E-03 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 Complement: pronoun 1.85E-02 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 SENSES 8 9 10 11 12 13 14 FEATURES Present Simple 7.05E-03 3.74E-04 3.74E-04 4.82E-03 0.00E+00 9.91E-03 0.00E+00 Present Simple 3rd 3.85E-04 1.36E-03 1.03E-04 1.08E-03 2.55E-05 2.02E-02 4.24E-03 Present Progressive 4.37E-03 2.58E-03 1.79E-03 2.94E-02 0.00E+00 1.92E-03 0.00E+00 Past Simple 8.76E-04 6.00E-05 2.88E-04 9.24E-04 0.00E+00 1.31E-02 0.00E+00 Past Progressive 7.78E-04 7.78E-04 5.18E-04 2.31E-02 0.00E+00 1.94E-03 0.00E+00 Pes. Perfect 2.00E-02 0.00E+00 6.67E-03 6.67E-03 0.00E+00 7.78E-03 0.00E+00 Present Perfect Progressive 3.85E-03 1.92E-03 7.69E-03 4.81E-02 0.00E+00 0.00E+00 0.00E+00 Past Perfect 1.76E-02 0.00E+00 0.00E+00 7.81E-03 0.00E+00 1.76E-02 0.00E+00 Past Perfect Progressive 0.00E+00 0.00E+00 0.00E+00 1.56E-01 0.00E+00 0.00E+00 0.00E+00 Will + inf 1.12E-02 5.28E-03 2.94E-03 6.46E-03 0.00E+00 2.06E-02 0.00E+00 Will Progressive 0.00E+00 0.00E+00 2.08E-02 1.88E-01 0.00E+00 0.00E+00 0.00E+00 Will Perfect 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 Will Perfect Progressive 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 Going to 1.06E-02 0.00E+00 6.36E-03 2.12E-02 0.00E+00 2.54E-02 0.00E+00 Modal present (verb) 6.02E-03 1.78E-03 1.00E-03 4.57E-03 0.00E+00 8.24E-03 0.00E+00 Modal past (verb) 8.76E-03 5.11E-03 2.19E-03 1.46E-02 0.00E+00 2.85E-02 0.00E+00 Present Participle 2.48E-03 8.56E-05 5.98E-04 1.17E-02 0.00E+00 1.03E-02 0.00E+00 Past Participle 2.01E-03 2.87E-03 6.69E-04 1.93E-02 0.00E+00 4.40E-03 0.00E+00 To-infinitive 4.76E-03 9.83E-04 1.97E-03 6.94E-03 0.00E+00 5.12E-03 0.00E+00 Aspect: simple 2.12E-03 5.88E-04 3.91E-04 3.83E-03 2.42E-06 7.43E-03 4.16E-04 Aspect: progressive 2.15E-03 3.00E-04 6.44E-04 1.44E-02 0.00E+00 4.03E-03 0.00E+00 Aspect: perfect 1.94E-02 0.00E+00 1.16E-02 1.55E-02 0.00E+00 2.33E-02 0.00E+00 Aspect: perfect progressive 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 Voice: active 1.91E-03 5.25E-04 3.74E-04 4.06E-03 2.29E-06 6.65E-03 3.61E-04 Voice: passive 6.17E-02 0.00E+00 6.49E-03 9.74E-03 0.00E+00 3.25E-03 0.00E+00 Intransitive 7.09E-04 1.93E-04 6.44E-05 7.09E-04 0.00E+00 1.48E-03 0.00E+00 Transitive 2.96E-03 7.92E-04 5.76E-04 6.16E-03 3.33E-06 2.93E-04 2.70E-05 Complex transitive 4.55E-03 0.00E+00 7.58E-04 1.21E-02 0.00E+00 0.00E+00 0.00E+00 Linking verb 0.00E+00 6.40E-05 0.00E+00 1.28E-04 0.00E+00 9.37E-02 5.01E-03 Declarative 1.85E-03 5.63E-04 3.90E-04 3.85E-03 2.29E-06 7.08E-03 3.93E-04 Interrogative 2.05E-03 5.86E-04 8.79E-04 2.08E-02 0.00E+00 2.20E-02 2.93E-04 Iimperative 7.67E-03 2.00E-04 4.67E-04 1.21E-02 0.00E+00 8.01E-04 0.00E+00 Main clause 1.75E-03 6.03E-04 3.09E-04 3.67E-03 0.00E+00 6.57E-03 5.03E-04 Subordinate clause 3.02E-03 5.68E-04 6.36E-04 6.22E-03 6.79E-06 8.91E-03 2.33E-04 Relative subordinate clause 0.00E+00 1.88E-03 0.00E+00 5.65E-03 0.00E+00 7.91E-02 1.88E-03 Subject: human 2.12E-03 5.74E-04 3.88E-04 4.63E-03 0.00E+00 4.52E-03 2.33E-05 Subject: animate 2.09E-03 5.69E-04 3.85E-04 4.62E-03 0.00E+00 4.61E-03 2.30E-05 Subject: inanimate 2.47E-03 1.23E-03 1.23E-03 3.30E-03 3.32E-05 3.75E-02 4.01E-04 Subject: concrete 1.85E-03 5.62E-04 3.85E-04 3.98E-03 2.86E-06 6.31E-03 4.94E-05 Subject: countable 1.89E-03 5.56E-04 4.01E-04 4.01E-03 2.86E-06 6.34E-03 5.46E-05 Subject: uncountable 3.61E-03 1.44E-03 1.44E-03 7.22E-03 0.00E+00 4.83E-02 7.21E-04 Subject: abstract 6.53E-03 1.06E-03 2.29E-03 9.52E-03 0.00E+00 4.14E-02 1.76E-04 Subject: proper 6.98E-04 7.20E-04 6.32E-04 3.71E-03 0.00E+00 5.17E-03 1.53E-04 Subject: machine 0.00E+00 0.00E+00 0.00E+00 1.00E+00 0.00E+00 0.00E+00 0.00E+00 Subject: location 0.00E+00 0.00E+00 1.09E-02 5.43E-02 0.00E+00 1.41E-01 0.00E+00 Subject: quantity 5.76E-03 6.40E-04 1.28E-03 2.18E-02 0.00E+00 2.18E-02 0.00E+00 Subject: possessive 9.80E-03 0.00E+00 9.80E-03 4.90E-02 0.00E+00 7.84E-02 9.80E-03 Subject: expletive 4.51E-04 2.26E-04 0.00E+00 1.35E-03 0.00E+00 6.91E-02 2.44E-02 Subject: simple 1.90E-03 5.54E-04 3.89E-04 4.06E-03 2.86E-06 6.34E-03 3.91E-05 Subject: compound 4.15E-03 9.22E-04 2.30E-03 7.37E-03 0.00E+00 3.50E-02 1.38E-03 <?page no="184"?> 184 Subject: phrase 2.66E-03 1.22E-03 2.00E-03 5.88E-03 0.00E+00 2.04E-02 5.55E-04 Subject: clause 0.00E+00 6.90E-03 0.00E+00 6.90E-03 0.00E+00 1.45E-01 6.90E-03 Subject: singular 1.70E-03 4.28E-04 3.79E-04 3.56E-03 4.06E-06 6.94E-03 6.84E-05 Subject: plural 3.66E-03 1.33E-03 7.18E-04 8.02E-03 0.00E+00 8.22E-03 1.37E-05 Subject: pronoun 2.79E-03 7.21E-04 3.67E-04 5.60E-03 0.00E+00 6.24E-03 3.15E-05 Head: human 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 1.00E+00 0.00E+00 Head: animate 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 1.00E+00 0.00E+00 Head: inanimate 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 1.00E+00 0.00E+00 Head: concrete 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 1.00E+00 0.00E+00 Head: countable 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 1.00E+00 0.00E+00 Head: uncountable 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 1.00E+00 0.00E+00 Head: abstract 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 1.00E+00 0.00E+00 Head: proper 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 1.00E+00 0.00E+00 Head: machine 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 Head: location 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 1.00E+00 0.00E+00 Head: quantity 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 Head: possessive 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 Head: expletive 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 1.00E+00 0.00E+00 Head: simple 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 1.00E+00 0.00E+00 Head: compound 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 1.00E+00 0.00E+00 Head: phrase 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 1.00E+00 0.00E+00 Head: clause 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 Head: singular 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 1.00E+00 0.00E+00 Head: plural 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 Head: pronoun 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 Object: human 1.03E-03 2.84E-04 7.86E-05 6.05E-03 0.00E+00 1.58E-03 3.14E-05 Object: animate 9.52E-04 2.94E-04 7.00E-05 5.75E-03 0.00E+00 1.65E-03 2.79E-05 Object: inanimate 3.76E-03 9.95E-04 7.84E-04 7.66E-03 6.33E-06 1.18E-03 6.33E-06 Object: concrete 1.02E-03 4.61E-04 2.59E-04 5.14E-03 4.69E-06 1.21E-03 1.41E-05 Object: countable 1.89E-03 6.06E-04 4.58E-04 5.65E-03 4.38E-06 1.07E-03 8.44E-06 Object: uncountable 1.28E-02 2.21E-03 1.31E-03 1.79E-02 0.00E+00 3.23E-03 6.00E-05 Object: abstract 1.34E-02 2.48E-03 2.21E-03 1.62E-02 0.00E+00 1.55E-03 0.00E+00 Object: proper 4.12E-03 1.05E-03 5.26E-04 8.77E-03 0.00E+00 1.84E-03 0.00E+00 Object: machine 0.00E+00 1.25E-01 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 Object: location 7.87E-03 4.49E-03 1.12E-03 1.12E-02 0.00E+00 7.87E-03 0.00E+00 Object: quantity 1.31E-02 3.65E-03 7.30E-04 1.97E-02 0.00E+00 1.46E-03 0.00E+00 Object: possessive 2.08E-02 0.00E+00 2.08E-02 4.17E-03 0.00E+00 4.17E-03 0.00E+00 Object: expletive 2.39E-02 3.11E-03 6.57E-03 4.15E-03 0.00E+00 3.46E-04 0.00E+00 Object: simple 2.17E-03 7.72E-04 5.42E-04 6.92E-03 0.00E+00 7.88E-04 1.52E-05 Object: compound 9.31E-03 8.87E-04 1.33E-03 1.42E-02 0.00E+00 2.81E-03 0.00E+00 Object: phrase 6.08E-03 1.05E-03 8.16E-04 1.10E-02 3.39E-05 4.52E-03 0.00E+00 Object: clause 1.62E-02 5.57E-03 2.32E-03 9.43E-03 0.00E+00 1.14E-02 1.08E-03 Object: singular 1.36E-03 4.32E-04 3.69E-04 5.13E-03 6.33E-06 1.33E-03 1.27E-05 Object: plural 4.81E-03 1.43E-03 9.57E-04 1.05E-02 0.00E+00 8.76E-04 0.00E+00 Object: pronoun 1.29E-03 2.08E-04 2.31E-04 3.44E-03 0.00E+00 4.39E-04 4.63E-05 Complement: human 5.74E-04 0.00E+00 0.00E+00 0.00E+00 0.00E+00 1.91E-01 0.00E+00 Complement: animate 5.04E-04 0.00E+00 0.00E+00 0.00E+00 0.00E+00 1.92E-01 0.00E+00 Complement: inanimate 2.83E-04 1.41E-04 0.00E+00 1.69E-03 0.00E+00 8.37E-02 9.88E-04 Complement: concrete 2.19E-04 1.10E-04 0.00E+00 1.21E-03 0.00E+00 9.42E-02 5.48E-04 Complement: countable 2.14E-04 1.07E-04 0.00E+00 1.07E-03 0.00E+00 9.44E-02 4.28E-04 Complement: uncountable 1.59E-03 0.00E+00 0.00E+00 3.18E-03 0.00E+00 1.44E-01 3.18E-03 Complement: abstract 1.10E-03 0.00E+00 0.00E+00 1.10E-03 0.00E+00 1.26E-01 3.30E-03 Complement: proper 0.00E+00 0.00E+00 0.00E+00 8.03E-03 0.00E+00 3.09E-01 0.00E+00 Complement: machine 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 1.00E+00 0.00E+00 Complement: location 0.00E+00 0.00E+00 0.00E+00 9.09E-02 0.00E+00 1.14E-01 0.00E+00 Complement: quantity 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 4.00E-01 0.00E+00 Complement: possessive 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 1.00E+00 0.00E+00 Complement: expletive 0.00E+00 1.04E-02 0.00E+00 0.00E+00 0.00E+00 2.19E-01 1.04E-02 <?page no="185"?> 185 Complement: simple 2.81E-04 1.40E-04 0.00E+00 8.42E-04 0.00E+00 8.52E-02 2.81E-04 Complement: compound 0.00E+00 0.00E+00 0.00E+00 4.17E-03 0.00E+00 1.90E-01 2.78E-03 Complement: phrase 2.50E-04 0.00E+00 0.00E+00 7.50E-04 0.00E+00 9.40E-02 1.50E-03 Complement: clause 1.04E-03 2.07E-04 2.07E-04 8.30E-04 0.00E+00 6.66E-02 2.78E-02 Complement: singular 2.78E-04 1.39E-04 0.00E+00 1.25E-03 0.00E+00 9.40E-02 5.56E-04 Complement: plural 0.00E+00 0.00E+00 0.00E+00 1.21E-03 0.00E+00 2.43E-01 0.00E+00 Complement: pronoun 2.32E-03 0.00E+00 0.00E+00 1.16E-02 0.00E+00 7.41E-02 2.32E-03 SENSES 15 16 17 18 19 20 21 FEATURES Present Simple 0.00E+00 7.33E-04 0.00E+00 1.56E-04 1.57E-05 7.78E-05 0.00E+00 Present Simple 3rd 5.15E-05 3.60E-04 2.55E-05 0.00E+00 0.00E+00 0.00E+00 0.00E+00 Present Progressive 0.00E+00 7.28E-04 0.00E+00 0.00E+00 0.00E+00 6.63E-05 0.00E+00 Past Simple 0.00E+00 2.28E-04 0.00E+00 0.00E+00 0.00E+00 2.76E-04 0.00E+00 Past Progressive 0.00E+00 1.29E-04 0.00E+00 0.00E+00 0.00E+00 1.29E-04 0.00E+00 Pes. Perfect 0.00E+00 7.78E-03 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 Present Perfect Progressive 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 Past Perfect 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 1.95E-03 0.00E+00 Past Perfect Progressive 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 Will + inf 0.00E+00 1.17E-03 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 Will Progressive 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 Will Perfect 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 Will Perfect Progressive 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 Going to 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 Modal present (verb) 0.00E+00 1.56E-03 0.00E+00 1.11E-04 1.11E-04 2.23E-04 0.00E+00 Modal past (verb) 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 Present Participle 0.00E+00 1.71E-04 0.00E+00 5.98E-04 0.00E+00 0.00E+00 0.00E+00 Past Participle 0.00E+00 9.56E-05 0.00E+00 0.00E+00 0.00E+00 1.43E-04 0.00E+00 To-infinitive 0.00E+00 2.59E-04 0.00E+00 0.00E+00 0.00E+00 1.03E-04 5.19E-05 Aspect: simple 5.15E-06 2.80E-04 2.42E-06 2.52E-05 5.15E-06 9.09E-05 2.42E-06 Aspect: progressive 0.00E+00 3.86E-04 0.00E+00 3.43E-04 0.00E+00 8.59E-05 0.00E+00 Aspect: perfect 0.00E+00 1.55E-02 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 Aspect: perfect progressive 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 Voice: active 4.29E-06 2.71E-04 2.29E-06 3.94E-05 4.29E-06 8.31E-05 2.29E-06 Voice: passive 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 Intransitive 0.00E+00 0.00E+00 0.00E+00 3.87E-04 0.00E+00 0.00E+00 0.00E+00 Transitive 0.00E+00 4.04E-04 3.33E-06 4.03E-05 6.67E-06 3.03E-05 3.33E-06 Complex transitive 0.00E+00 3.03E-03 0.00E+00 0.00E+00 0.00E+00 2.12E-02 0.00E+00 Linking verb 6.40E-05 0.00E+00 0.00E+00 0.00E+00 0.00E+00 3.20E-05 0.00E+00 Declarative 4.86E-06 2.75E-04 2.29E-06 3.83E-05 2.29E-06 7.89E-05 2.29E-06 Interrogative 0.00E+00 2.93E-04 0.00E+00 0.00E+00 0.00E+00 5.86E-04 0.00E+00 Iimperative 0.00E+00 5.34E-04 0.00E+00 1.33E-04 6.69E-05 2.00E-04 0.00E+00 Main clause 7.88E-06 3.17E-04 3.94E-06 3.48E-05 7.88E-06 8.88E-05 0.00E+00 Subordinate clause 0.00E+00 2.87E-04 0.00E+00 6.14E-05 0.00E+00 1.03E-04 6.79E-06 Relative subordinate clause 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 Subject: human 6.67E-06 2.75E-04 3.33E-06 3.64E-05 3.33E-06 1.19E-04 3.33E-06 Subject: animate 6.67E-06 2.73E-04 3.33E-06 3.61E-05 3.33E-06 1.18E-04 3.33E-06 Subject: inanimate 0.00E+00 7.01E-04 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 Subject: concrete 5.43E-06 2.40E-04 2.86E-06 3.03E-05 2.86E-06 9.91E-05 2.86E-06 Subject: countable 5.43E-06 2.64E-04 2.86E-06 3.00E-05 2.86E-06 9.80E-05 2.86E-06 Subject: uncountable 0.00E+00 1.68E-03 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 Subject: abstract 0.00E+00 3.00E-03 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 Subject: proper 2.17E-05 3.06E-04 2.17E-05 2.17E-05 0.00E+00 8.72E-05 0.00E+00 Subject: machine 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 Subject: location 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 Subject: quantity 0.00E+00 5.12E-03 0.00E+00 6.40E-04 0.00E+00 0.00E+00 0.00E+00 Subject: possessive 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 <?page no="186"?> 186 Subject: expletive 0.00E+00 9.03E-04 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 Subject: simple 2.86E-06 2.21E-04 2.86E-06 3.09E-05 2.86E-06 1.01E-04 2.86E-06 Subject: compound 4.61E-04 3.68E-03 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 Subject: phrase 0.00E+00 1.88E-03 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 Subject: clause 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 Subject: singular 8.13E-06 1.45E-04 4.06E-06 2.41E-05 0.00E+00 1.41E-04 0.00E+00 Subject: plural 0.00E+00 8.13E-04 0.00E+00 6.78E-05 1.37E-05 1.37E-05 1.37E-05 Subject: pronoun 6.30E-06 1.96E-04 0.00E+00 3.78E-05 0.00E+00 1.77E-04 0.00E+00 Head: human 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 Head: animate 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 Head: inanimate 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 Head: concrete 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 Head: countable 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 Head: uncountable 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 Head: abstract 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 Head: proper 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 Head: machine 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 Head: location 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 Head: quantity 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 Head: possessive 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 Head: expletive 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 Head: simple 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 Head: compound 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 Head: phrase 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 Head: clause 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 Head: singular 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 Head: plural 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 Head: pronoun 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 Object: human 0.00E+00 6.15E-04 0.00E+00 0.00E+00 3.14E-05 3.94E-04 0.00E+00 Object: animate 0.00E+00 5.46E-04 0.00E+00 0.00E+00 2.79E-05 3.50E-04 0.00E+00 Object: inanimate 0.00E+00 4.91E-04 6.33E-06 7.47E-05 0.00E+00 8.10E-05 0.00E+00 Object: concrete 0.00E+00 3.25E-04 4.69E-06 9.38E-06 9.38E-06 1.79E-04 0.00E+00 Object: countable 0.00E+00 3.35E-04 4.38E-06 2.53E-05 8.44E-06 1.61E-04 0.00E+00 Object: uncountable 0.00E+00 2.33E-03 0.00E+00 3.58E-04 0.00E+00 0.00E+00 0.00E+00 Object: abstract 0.00E+00 1.43E-03 0.00E+00 2.99E-04 0.00E+00 0.00E+00 0.00E+00 Object: proper 0.00E+00 2.28E-03 0.00E+00 0.00E+00 0.00E+00 5.26E-04 0.00E+00 Object: machine 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 Object: location 0.00E+00 1.12E-03 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 Object: quantity 0.00E+00 7.30E-04 0.00E+00 0.00E+00 0.00E+00 7.30E-04 0.00E+00 Object: possessive 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 Object: expletive 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 Object: simple 0.00E+00 4.51E-04 0.00E+00 2.52E-05 1.00E-05 1.81E-04 0.00E+00 Object: compound 0.00E+00 1.03E-03 0.00E+00 2.96E-04 0.00E+00 0.00E+00 0.00E+00 Object: phrase 0.00E+00 7.14E-04 3.39E-05 1.70E-04 0.00E+00 6.78E-05 0.00E+00 Object: clause 0.00E+00 3.09E-04 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 Object: singular 0.00E+00 2.19E-04 0.00E+00 1.87E-05 1.27E-05 2.06E-04 0.00E+00 Object: plural 0.00E+00 8.55E-04 2.04E-05 6.13E-05 0.00E+00 1.02E-04 0.00E+00 Object: pronoun 0.00E+00 4.39E-04 0.00E+00 0.00E+00 2.29E-05 2.08E-04 0.00E+00 Complement: human 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 Complement: animate 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 Complement: inanimate 0.00E+00 7.05E-04 0.00E+00 0.00E+00 0.00E+00 7.05E-04 0.00E+00 Complement: concrete 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 5.48E-04 0.00E+00 Complement: countable 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 5.35E-04 0.00E+00 Complement: uncountable 0.00E+00 7.94E-03 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 Complement: abstract 0.00E+00 5.49E-03 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 Complement: proper 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 Complement: machine 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 Complement: location 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 Complement: quantity 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 <?page no="187"?> 187 Complement: possessive 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 Complement: expletive 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 Complement: simple 0.00E+00 4.21E-04 0.00E+00 0.00E+00 0.00E+00 5.61E-04 0.00E+00 Complement: compound 0.00E+00 1.39E-03 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 Complement: phrase 2.50E-04 2.50E-04 0.00E+00 0.00E+00 0.00E+00 2.50E-04 0.00E+00 Complement: clause 2.07E-04 2.07E-04 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 Complement: singular 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 6.95E-04 0.00E+00 Complement: plural 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 Complement: pronoun 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 4.63E-03 0.00E+00 SENSES 22 23 24 25 26 27 28 FEATURES Present Simple 7.78E-05 0.00E+00 1.25E-04 6.86E-04 0.00E+00 1.57E-05 6.86E-04 Present Simple 3rd 7.70E-05 0.00E+00 1.80E-04 1.03E-04 0.00E+00 0.00E+00 0.00E+00 Present Progressive 7.28E-04 0.00E+00 1.99E-04 6.63E-05 0.00E+00 0.00E+00 2.05E-03 Past Simple 3.60E-05 4.80E-05 1.32E-04 1.20E-04 2.40E-05 7.20E-05 1.32E-04 Past Progressive 1.29E-04 1.29E-04 3.89E-04 2.59E-04 0.00E+00 0.00E+00 1.17E-03 Pes. Perfect 1.11E-03 0.00E+00 2.22E-03 0.00E+00 1.11E-03 0.00E+00 0.00E+00 Present Perfect Progressive 0.00E+00 0.00E+00 1.92E-03 0.00E+00 0.00E+00 0.00E+00 1.92E-03 Past Perfect 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 1.95E-03 Past Perfect Progressive 0.00E+00 0.00E+00 3.13E-02 0.00E+00 0.00E+00 0.00E+00 3.13E-02 Will + inf 5.87E-04 0.00E+00 1.76E-03 2.94E-03 0.00E+00 0.00E+00 5.87E-04 Will Progressive 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 2.08E-02 Will Perfect 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 Will Perfect Progressive 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 Going to 0.00E+00 0.00E+00 8.48E-03 6.36E-03 0.00E+00 0.00E+00 0.00E+00 Modal present (verb) 3.34E-04 0.00E+00 1.23E-03 5.57E-04 0.00E+00 1.11E-04 7.80E-04 Modal past (verb) 0.00E+00 0.00E+00 3.65E-03 2.19E-03 0.00E+00 0.00E+00 0.00E+00 Present Participle 0.00E+00 0.00E+00 8.54E-04 1.11E-03 0.00E+00 8.56E-05 1.71E-04 Past Participle 2.39E-04 0.00E+00 4.31E-04 9.56E-04 0.00E+00 0.00E+00 3.34E-04 To-infinitive 2.07E-04 0.00E+00 1.35E-03 0.00E+00 5.19E-05 1.03E-04 6.21E-04 Aspect: simple 9.09E-05 1.27E-05 2.25E-04 2.42E-04 1.00E-05 2.79E-05 2.45E-04 Aspect: progressive 4.27E-05 0.00E+00 6.87E-04 6.01E-04 0.00E+00 0.00E+00 1.37E-03 Aspect: perfect 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 Aspect: perfect progressive 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 Voice: active 8.09E-05 1.09E-05 2.27E-04 2.40E-04 8.86E-06 1.54E-05 2.82E-04 Voice: passive 0.00E+00 0.00E+00 3.25E-03 0.00E+00 0.00E+00 1.30E-02 0.00E+00 Intransitive 1.29E-04 0.00E+00 3.22E-04 3.29E-03 1.93E-04 0.00E+00 3.87E-04 Transitive 1.18E-04 1.70E-05 3.37E-04 1.95E-04 3.33E-06 3.70E-05 4.01E-04 Complex transitive 0.00E+00 0.00E+00 0.00E+00 7.58E-04 0.00E+00 0.00E+00 2.27E-03 Linking verb 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 3.20E-05 Declarative 8.86E-05 1.20E-05 2.39E-04 2.54E-04 9.71E-06 2.63E-05 3.01E-04 Interrogative 0.00E+00 0.00E+00 1.46E-03 2.93E-04 0.00E+00 0.00E+00 2.93E-04 Iimperative 0.00E+00 0.00E+00 0.00E+00 2.00E-04 0.00E+00 0.00E+00 1.33E-04 Main clause 8.88E-05 1.55E-05 1.85E-04 1.62E-04 3.94E-06 1.55E-05 3.40E-04 Subordinate clause 9.57E-05 6.79E-06 3.83E-04 4.58E-04 2.04E-05 4.79E-05 2.80E-04 Relative subordinate clause 0.00E+00 0.00E+00 1.88E-03 1.88E-03 0.00E+00 0.00E+00 0.00E+00 Subject: human 8.61E-05 1.67E-05 2.62E-04 2.75E-04 6.67E-06 2.67E-05 3.75E-04 Subject: animate 8.55E-05 1.64E-05 2.63E-04 2.73E-04 6.67E-06 2.64E-05 3.72E-04 Subject: inanimate 2.34E-04 0.00E+00 3.34E-04 1.67E-04 0.00E+00 1.00E-04 1.34E-04 Subject: concrete 8.54E-05 1.37E-05 2.48E-04 2.34E-04 5.43E-06 2.49E-05 3.19E-04 Subject: countable 8.46E-05 1.37E-05 2.45E-04 2.29E-04 5.43E-06 2.46E-05 3.16E-04 Subject: uncountable 4.81E-04 0.00E+00 0.00E+00 7.21E-04 0.00E+00 4.81E-04 2.41E-04 Subject: abstract 3.53E-04 0.00E+00 0.00E+00 3.53E-04 0.00E+00 3.53E-04 1.76E-04 Subject: proper 6.55E-05 2.17E-05 3.71E-04 1.53E-04 2.17E-05 0.00E+00 6.55E-05 Subject: machine 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 Subject: location 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 <?page no="188"?> 188 Subject: quantity 6.40E-04 0.00E+00 0.00E+00 6.40E-04 0.00E+00 0.00E+00 0.00E+00 Subject: possessive 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 Subject: expletive 0.00E+00 0.00E+00 4.51E-04 0.00E+00 0.00E+00 0.00E+00 4.51E-04 Subject: simple 8.94E-05 1.40E-05 2.43E-04 2.32E-04 5.71E-06 2.23E-05 3.22E-04 Subject: compound 0.00E+00 0.00E+00 9.22E-04 4.61E-04 0.00E+00 4.61E-04 4.61E-04 Subject: phrase 1.11E-04 0.00E+00 2.22E-04 4.44E-04 0.00E+00 2.22E-04 2.22E-04 Subject: clause 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 Subject: singular 4.84E-05 1.63E-05 2.58E-04 2.22E-04 8.13E-06 1.22E-05 2.94E-04 Subject: plural 2.57E-04 1.37E-05 3.52E-04 3.93E-04 0.00E+00 8.11E-05 5.83E-04 Subject: pronoun 8.85E-05 2.52E-05 2.91E-04 2.84E-04 6.30E-06 1.26E-05 4.81E-04 Head: human 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 Head: animate 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 Head: inanimate 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 Head: concrete 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 Head: countable 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 Head: uncountable 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 Head: abstract 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 Head: proper 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 Head: machine 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 Head: location 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 Head: quantity 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 Head: possessive 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 Head: expletive 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 Head: simple 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 Head: compound 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 Head: phrase 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 Head: clause 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 1.00E+00 Head: singular 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 Head: plural 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 Head: pronoun 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 Object: human 3.14E-05 4.73E-05 8.67E-04 4.73E-05 0.00E+00 6.32E-05 1.42E-04 Object: animate 2.79E-05 4.21E-05 7.98E-04 4.21E-05 0.00E+00 5.58E-05 1.26E-04 Object: inanimate 1.80E-04 1.23E-05 2.24E-04 2.80E-04 6.33E-06 1.23E-05 3.86E-04 Object: concrete 1.88E-05 2.34E-05 3.62E-04 6.59E-05 0.00E+00 2.34E-05 1.41E-04 Object: countable 2.53E-05 2.13E-05 3.64E-04 1.27E-04 0.00E+00 2.53E-05 2.25E-04 Object: uncountable 1.49E-03 0.00E+00 4.18E-04 1.02E-03 6.00E-05 0.00E+00 9.56E-04 Object: abstract 8.06E-04 0.00E+00 4.78E-04 9.85E-04 3.00E-05 3.00E-05 1.16E-03 Object: proper 8.75E-05 0.00E+00 5.26E-04 4.38E-04 0.00E+00 0.00E+00 1.76E-04 Object: machine 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 Object: location 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 Object: quantity 0.00E+00 0.00E+00 0.00E+00 7.30E-04 0.00E+00 0.00E+00 0.00E+00 Object: possessive 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 Object: expletive 0.00E+00 0.00E+00 1.04E-03 2.42E-03 0.00E+00 0.00E+00 1.73E-03 Object: simple 1.46E-04 2.52E-05 4.36E-04 1.36E-04 5.17E-06 2.52E-05 2.91E-04 Object: compound 0.00E+00 0.00E+00 1.48E-04 4.44E-04 0.00E+00 0.00E+00 1.48E-04 Object: phrase 1.36E-04 0.00E+00 2.04E-04 6.12E-04 0.00E+00 6.78E-05 4.76E-04 Object: clause 4.64E-04 0.00E+00 3.09E-04 1.54E-04 0.00E+00 0.00E+00 4.95E-03 Object: singular 3.13E-05 3.13E-05 3.32E-04 1.00E-04 0.00E+00 1.27E-05 2.50E-04 Object: plural 2.04E-05 0.00E+00 6.72E-04 2.65E-04 0.00E+00 8.17E-05 2.65E-04 Object: pronoun 2.29E-05 6.92E-05 7.86E-04 1.39E-04 0.00E+00 2.29E-05 3.23E-04 Complement: human 0.00E+00 0.00E+00 0.00E+00 5.74E-04 0.00E+00 0.00E+00 1.72E-03 Complement: animate 0.00E+00 0.00E+00 0.00E+00 5.04E-04 0.00E+00 0.00E+00 1.51E-03 Complement: inanimate 0.00E+00 0.00E+00 0.00E+00 1.41E-04 0.00E+00 0.00E+00 1.41E-04 Complement: concrete 0.00E+00 0.00E+00 0.00E+00 1.10E-04 0.00E+00 0.00E+00 4.38E-04 Complement: countable 0.00E+00 0.00E+00 0.00E+00 2.14E-04 0.00E+00 0.00E+00 4.28E-04 Complement: uncountable 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 Complement: abstract 0.00E+00 0.00E+00 0.00E+00 1.10E-03 0.00E+00 0.00E+00 0.00E+00 Complement: proper 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 <?page no="189"?> 189 Complement: machine 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 Complement: location 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 Complement: quantity 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 Complement: possessive 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 Complement: expletive 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 Complement: simple 0.00E+00 0.00E+00 0.00E+00 1.40E-04 0.00E+00 0.00E+00 4.21E-04 Complement: compound 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 Complement: phrase 0.00E+00 0.00E+00 0.00E+00 2.50E-04 0.00E+00 0.00E+00 2.50E-04 Complement: clause 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 6.22E-04 Complement: singular 0.00E+00 0.00E+00 0.00E+00 1.39E-04 0.00E+00 0.00E+00 4.17E-04 Complement: plural 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 1.21E-03 Complement: pronoun 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 9.26E-03 SENSES 29 30 31 32 33 34 35 FEATURES Present Simple 0.00E+00 9.35E-05 2.03E-04 0.00E+00 0.00E+00 7.78E-05 7.78E-05 Present Simple 3rd 2.55E-05 5.15E-05 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 Present Progressive 0.00E+00 0.00E+00 1.33E-04 0.00E+00 0.00E+00 0.00E+00 0.00E+00 Past Simple 0.00E+00 8.40E-05 2.40E-05 0.00E+00 2.40E-05 1.32E-04 3.60E-05 Past Progressive 0.00E+00 0.00E+00 1.29E-04 2.59E-04 0.00E+00 1.29E-04 0.00E+00 Pes. Perfect 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 1.11E-03 0.00E+00 Present Perfect Progressive 0.00E+00 0.00E+00 1.92E-03 0.00E+00 0.00E+00 0.00E+00 0.00E+00 Past Perfect 0.00E+00 0.00E+00 1.95E-03 0.00E+00 0.00E+00 3.91E-03 0.00E+00 Past Perfect Progressive 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 Will + inf 0.00E+00 5.87E-04 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 Will Progressive 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 Will Perfect 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 Will Perfect Progressive 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 Going to 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 2.12E-03 Modal present (verb) 0.00E+00 0.00E+00 2.23E-04 0.00E+00 0.00E+00 3.34E-04 3.34E-04 Modal past (verb) 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 7.30E-04 Present Participle 0.00E+00 8.56E-05 1.71E-04 0.00E+00 0.00E+00 0.00E+00 0.00E+00 Past Participle 0.00E+00 9.56E-05 9.56E-05 0.00E+00 0.00E+00 4.78E-05 0.00E+00 To-infinitive 0.00E+00 5.19E-05 1.55E-04 0.00E+00 0.00E+00 1.55E-04 1.55E-04 Aspect: simple 2.42E-06 4.79E-05 6.06E-05 0.00E+00 5.15E-06 6.55E-05 4.03E-05 Aspect: progressive 0.00E+00 4.27E-05 2.15E-04 8.59E-05 0.00E+00 4.27E-05 0.00E+00 Aspect: perfect 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 Aspect: perfect progressive 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 Voice: active 2.29E-06 4.37E-05 6.34E-05 4.29E-06 4.29E-06 5.91E-05 3.49E-05 Voice: passive 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 Intransitive 0.00E+00 1.03E-03 6.44E-04 1.29E-04 0.00E+00 0.00E+00 0.00E+00 Transitive 3.33E-06 1.00E-05 6.39E-05 0.00E+00 6.67E-06 7.42E-05 5.39E-05 Complex transitive 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 3.79E-03 0.00E+00 Linking verb 0.00E+00 3.20E-05 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 Declarative 2.29E-06 4.80E-05 4.06E-05 4.86E-06 4.86E-06 5.51E-05 3.83E-05 Interrogative 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 2.93E-04 0.00E+00 Iimperative 0.00E+00 0.00E+00 8.01E-04 0.00E+00 0.00E+00 2.00E-04 0.00E+00 Main clause 3.94E-06 3.48E-05 1.01E-04 7.88E-06 3.94E-06 5.79E-05 2.33E-05 Subordinate clause 0.00E+00 7.54E-05 2.04E-05 0.00E+00 6.79E-06 8.21E-05 6.82E-05 Relative subordinate clause 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 Subject: human 3.33E-06 5.97E-05 5.30E-05 0.00E+00 3.33E-06 8.30E-05 4.97E-05 Subject: animate 3.33E-06 5.91E-05 5.27E-05 0.00E+00 3.33E-06 8.21E-05 4.94E-05 Subject: inanimate 0.00E+00 3.32E-05 0.00E+00 3.32E-05 0.00E+00 0.00E+00 0.00E+00 Subject: concrete 2.86E-06 5.23E-05 4.40E-05 2.86E-06 2.86E-06 6.89E-05 4.14E-05 Subject: countable 2.86E-06 4.91E-05 4.37E-05 2.86E-06 2.86E-06 6.80E-05 4.09E-05 Subject: uncountable 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 Subject: abstract 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 <?page no="190"?> 190 Subject: proper 2.17E-05 1.31E-04 4.38E-05 0.00E+00 2.17E-05 2.17E-05 2.17E-05 Subject: machine 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 Subject: location 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 Subject: quantity 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 6.40E-04 0.00E+00 Subject: possessive 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 Subject: expletive 0.00E+00 2.26E-04 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 Subject: simple 2.86E-06 4.49E-05 4.20E-05 2.86E-06 2.86E-06 6.71E-05 4.20E-05 Subject: compound 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 Subject: phrase 0.00E+00 3.33E-04 1.11E-04 0.00E+00 0.00E+00 1.11E-04 0.00E+00 Subject: clause 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 Subject: singular 4.06E-06 4.44E-05 4.03E-05 0.00E+00 4.06E-06 6.44E-05 3.63E-05 Subject: plural 0.00E+00 1.09E-04 8.11E-05 1.37E-05 0.00E+00 1.22E-04 8.11E-05 Subject: pronoun 0.00E+00 1.89E-05 5.70E-05 0.00E+00 0.00E+00 1.01E-04 5.07E-05 Head: human 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 Head: animate 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 Head: inanimate 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 Head: concrete 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 Head: countable 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 Head: uncountable 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 Head: abstract 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 Head: proper 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 Head: machine 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 Head: location 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 Head: quantity 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 Head: possessive 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 Head: expletive 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 Head: simple 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 Head: compound 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 Head: phrase 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 Head: clause 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 Head: singular 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 Head: plural 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 Head: pronoun 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 Object: human 0.00E+00 0.00E+00 2.36E-04 0.00E+00 1.59E-05 0.00E+00 2.21E-04 Object: animate 0.00E+00 0.00E+00 2.10E-04 0.00E+00 1.42E-05 1.42E-05 1.96E-04 Object: inanimate 6.33E-06 6.33E-06 1.23E-05 6.33E-06 6.33E-06 1.56E-04 0.00E+00 Object: concrete 4.69E-06 4.69E-06 7.53E-05 4.69E-06 4.69E-06 1.03E-04 6.59E-05 Object: countable 4.38E-06 4.38E-06 6.78E-05 4.38E-06 8.44E-06 1.02E-04 5.94E-05 Object: uncountable 0.00E+00 0.00E+00 6.00E-05 0.00E+00 0.00E+00 1.19E-04 0.00E+00 Object: abstract 0.00E+00 0.00E+00 3.00E-05 0.00E+00 3.00E-05 1.20E-04 0.00E+00 Object: proper 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 8.75E-05 8.75E-05 Object: machine 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 Object: location 0.00E+00 1.12E-03 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 Object: quantity 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 Object: possessive 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 4.17E-03 0.00E+00 Object: expletive 0.00E+00 0.00E+00 3.46E-04 0.00E+00 0.00E+00 3.46E-04 0.00E+00 Object: simple 0.00E+00 5.17E-06 7.03E-05 5.17E-06 1.00E-05 1.16E-04 6.03E-05 Object: compound 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 4.44E-04 2.96E-04 Object: phrase 3.39E-05 0.00E+00 1.02E-04 0.00E+00 0.00E+00 0.00E+00 0.00E+00 Object: clause 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 Object: singular 6.33E-06 0.00E+00 5.00E-05 6.33E-06 6.33E-06 8.77E-05 4.37E-05 Object: plural 0.00E+00 2.04E-05 1.63E-04 0.00E+00 2.04E-05 2.04E-04 1.43E-04 Object: pronoun 0.00E+00 0.00E+00 2.78E-04 0.00E+00 2.29E-05 2.29E-05 2.08E-04 Complement: human 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 Complement: animate 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 Complement: inanimate 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 1.41E-04 0.00E+00 Complement: concrete 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 1.10E-04 0.00E+00 Complement: countable 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 1.07E-04 0.00E+00 <?page no="191"?> 191 Complement: uncountable 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 Complement: abstract 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 Complement: proper 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 Complement: machine 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 Complement: location 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 2.27E-02 0.00E+00 Complement: quantity 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 Complement: possessive 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 Complement: expletive 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 Complement: simple 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 1.40E-04 0.00E+00 Complement: compound 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 Complement: phrase 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 Complement: clause 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 Complement: singular 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 1.39E-04 0.00E+00 Complement: plural 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 Complement: pronoun 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 2.32E-03 0.00E+00 <?page no="192"?> 192 Appendix 7 List of the predictive powers of all of the non-predefined features. SENSES 1 2 3 4 5 6 7 FEATURES L1: after 2.34E-01 0.00E+00 0.00E+00 0.00E+00 0.00E+00 1.56E-02 3.13E-02 L1: and 6.00E-01 3.33E-03 0.00E+00 0.00E+00 0.00E+00 6.67E-03 4.67E-02 L1: at 8.22E-02 4.67E-05 0.00E+00 0.00E+00 0.00E+00 0.00E+00 3.77E-04 L1: but 2.77E-01 9.90E-03 0.00E+00 0.00E+00 0.00E+00 0.00E+00 2.97E-02 L1: by 1.67E-01 0.00E+00 0.00E+00 0.00E+00 0.00E+00 1.61E-02 0.00E+00 L1: for 8.66E-03 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 5.95E-03 L1: forward 1.22E-02 1.22E-02 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 L1: he 2.79E-01 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 2.35E-02 L1: how 1.83E-02 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 L1: in 7.41E-01 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 L1: it 1.49E-02 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 L1: just 2.15E-01 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 L1: like 4.61E-03 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 L1: she 5.20E-01 1.00E-02 0.00E+00 0.00E+00 0.00E+00 0.00E+00 3.00E-02 L1: that 1.97E-01 1.45E-02 0.00E+00 0.00E+00 0.00E+00 0.00E+00 1.62E-03 L1: to 3.40E-01 7.09E-03 0.00E+00 0.00E+00 2.36E-03 0.00E+00 2.36E-03 L1: up 6.82E-02 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 L1: what 1.32E-02 3.30E-04 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 L1: when 4.90E-01 2.60E-03 0.00E+00 0.00E+00 5.21E-03 2.60E-03 7.81E-03 L1: which 2.19E-01 1.37E-02 0.00E+00 0.00E+00 0.00E+00 0.00E+00 2.74E-02 L1: who 2.26E-01 9.62E-03 0.00E+00 0.00E+00 0.00E+00 4.81E-03 0.00E+00 L2: at 7.39E-01 4.20E-04 0.00E+00 0.00E+00 0.00E+00 0.00E+00 3.39E-03 L2: it 1.24E-03 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 L2: out 7.30E-01 9.77E-02 0.00E+00 0.00E+00 0.00E+00 0.00E+00 1.15E-02 L2: that 1.97E-01 1.45E-02 0.00E+00 0.00E+00 0.00E+00 0.00E+00 1.62E-03 R1: after 2.34E-01 0.00E+00 0.00E+00 0.00E+00 0.00E+00 1.56E-02 3.13E-02 R1: and 6.00E-01 3.33E-03 0.00E+00 0.00E+00 0.00E+00 6.67E-03 4.67E-02 R1: around 8.20E-02 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 8.48E-01 R1: as 3.30E-01 3.58E-03 0.00E+00 0.00E+00 7.17E-03 3.58E-03 2.51E-02 R1: at 7.39E-01 4.20E-04 0.00E+00 0.00E+00 0.00E+00 0.00E+00 3.39E-03 R1: away 4.92E-01 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 R1: back 4.74E-01 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 R1: down 8.33E-02 8.45E-04 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 R1: for 1.24E-03 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 8.50E-04 R1: forward 6.01E-03 6.01E-03 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 R1: from 9.52E-01 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 R1: if 3.04E-01 4.14E-03 0.00E+00 0.00E+00 2.07E-03 2.07E-03 1.04E-02 R1: in 6.82E-01 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 R1: into 2.12E-03 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 R1: it 4.60E-03 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 R1: like 2.89E-02 3.40E-03 0.00E+00 0.00E+00 5.10E-03 8.50E-04 0.00E+00 R1: on 7.27E-02 9.09E-03 0.00E+00 0.00E+00 0.00E+00 0.00E+00 1.07E-03 R1: out 7.88E-01 1.25E-02 0.00E+00 0.00E+00 1.63E-01 6.25E-03 6.25E-03 R1: over 6.61E-02 4.84E-03 0.00E+00 0.00E+00 0.00E+00 0.00E+00 5.37E-04 R1: that 1.32E-01 0.00E+00 2.63E-03 0.00E+00 0.00E+00 1.00E-01 0.00E+00 R1: through 8.33E-02 1.73E-03 0.00E+00 0.00E+00 5.75E-04 0.00E+00 5.75E-04 R1: to 1.47E-01 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 R1: up 1.06E-01 2.64E-03 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 R1: what 1.88E-02 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 2.13E-01 R2: around 3.52E-01 3.83E-03 0.00E+00 0.00E+00 7.66E-03 3.83E-03 2.68E-02 R2: as 7.39E-01 4.20E-04 0.00E+00 0.00E+00 0.00E+00 0.00E+00 3.39E-03 <?page no="193"?> 193 R2: at 4.33E-03 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 2.98E-03 R2: for 4.72E-01 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 R2: from 2.20E-01 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 R2: how 3.04E-01 4.14E-03 0.00E+00 0.00E+00 2.07E-03 2.07E-03 1.04E-02 R2: if 6.79E-01 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 R2: in 4.44E-01 9.80E-03 0.00E+00 0.00E+00 0.00E+00 3.27E-03 0.00E+00 R2: into 4.61E-03 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 R2: like 7.94E-02 9.35E-03 0.00E+00 0.00E+00 1.40E-02 2.34E-03 0.00E+00 R2: on 7.82E-02 1.28E-03 0.00E+00 0.00E+00 1.67E-02 6.41E-04 6.41E-04 R2: over 5.33E-01 0.00E+00 1.09E-02 0.00E+00 0.00E+00 4.13E-01 0.00E+00 R2: through 3.40E-01 7.09E-03 0.00E+00 0.00E+00 2.36E-03 0.00E+00 2.36E-03 R2: to 8.87E-01 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 R2: up 1.06E-01 2.64E-03 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 R2: what 2.34E-01 0.00E+00 0.00E+00 0.00E+00 0.00E+00 1.56E-02 3.13E-02 Sub: after 6.00E-01 3.33E-03 0.00E+00 0.00E+00 0.00E+00 6.67E-03 4.67E-02 Sub: and 1.94E-01 2.16E-03 0.00E+00 0.00E+00 4.31E-03 2.16E-03 1.51E-02 Sub: as 2.46E-01 1.40E-04 0.00E+00 0.00E+00 0.00E+00 0.00E+00 1.13E-03 Sub: at 2.77E-01 9.90E-03 0.00E+00 0.00E+00 0.00E+00 0.00E+00 2.97E-02 Sub: but 9.62E-04 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 6.61E-04 Sub: for 9.44E-01 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 Sub: from 2.20E-01 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 Sub: how 3.04E-01 4.14E-03 0.00E+00 0.00E+00 2.07E-03 2.07E-03 1.04E-02 Sub: if 1.48E-01 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 Sub: in 1.54E-03 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 Sub: like 1.97E-01 1.45E-02 0.00E+00 0.00E+00 0.00E+00 0.00E+00 1.62E-03 Sub: that 2.52E-01 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 6.29E-03 Sub: then 2.66E-01 0.00E+00 5.44E-03 0.00E+00 0.00E+00 2.07E-01 0.00E+00 Sub: through 3.40E-01 7.09E-03 0.00E+00 0.00E+00 2.36E-03 0.00E+00 2.36E-03 Sub: to 1.08E-01 2.63E-03 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 Sub: what 4.90E-01 2.60E-03 0.00E+00 0.00E+00 5.21E-03 2.60E-03 7.81E-03 Sub: when 2.32E-01 1.45E-02 0.00E+00 0.00E+00 0.00E+00 0.00E+00 2.90E-02 Sub: which 2.05E-02 0.00E+00 0.00E+00 0.00E+00 0.00E+00 2.56E-03 5.12E-03 Sub: while 1.13E-01 4.81E-03 0.00E+00 0.00E+00 0.00E+00 2.41E-03 0.00E+00 Sub: who 2.34E-01 0.00E+00 0.00E+00 0.00E+00 0.00E+00 1.56E-02 3.13E-02 SENSES 8 9 10 11 12 13 14 FEATURES L1: after 0.00E+00 4.69E-02 3.13E-02 1.56E-02 0.00E+00 3.13E-02 1.56E-02 L1: and 3.00E-02 1.67E-02 0.00E+00 1.00E-02 3.33E-03 2.53E-01 1.67E-02 L1: at 2.04E-02 6.89E-03 7.00E-05 2.36E-04 0.00E+00 7.06E-04 4.67E-05 L1: but 1.19E-01 1.98E-02 0.00E+00 6.93E-02 0.00E+00 3.56E-01 4.95E-02 L1: by 3.23E-02 4.84E-02 5.38E-03 5.38E-02 0.00E+00 5.38E-03 0.00E+00 L1: for 1.62E-03 1.62E-03 5.40E-04 9.70E-01 5.40E-04 4.33E-03 0.00E+00 L1: forward 0.00E+00 6.10E-03 6.10E-03 0.00E+00 0.00E+00 0.00E+00 0.00E+00 L1: he 3.36E-03 6.71E-03 0.00E+00 1.34E-02 0.00E+00 1.64E-01 0.00E+00 L1: how 7.33E-03 9.16E-04 1.83E-03 3.66E-03 0.00E+00 4.49E-02 9.16E-04 L1: in 3.55E-02 1.02E-02 8.12E-02 2.54E-02 0.00E+00 3.55E-02 0.00E+00 L1: it 0.00E+00 1.12E-02 0.00E+00 0.00E+00 0.00E+00 6.62E-01 3.12E-01 L1: just 5.19E-02 7.41E-03 0.00E+00 7.41E-03 0.00E+00 4.44E-02 7.41E-03 L1: like 0.00E+00 0.00E+00 6.60E-04 0.00E+00 0.00E+00 9.07E-01 8.76E-02 L1: she 0.00E+00 2.00E-02 0.00E+00 2.00E-02 0.00E+00 3.70E-01 0.00E+00 L1: that 7.75E-02 3.07E-02 1.29E-02 9.05E-02 0.00E+00 5.27E-01 1.45E-02 L1: to 3.07E-02 1.18E-02 9.46E-03 9.46E-03 0.00E+00 6.38E-02 4.73E-03 L1: up 1.97E-04 0.00E+00 0.00E+00 3.95E-04 0.00E+00 7.89E-04 3.95E-04 L1: what 9.57E-03 1.32E-03 0.00E+00 2.47E-02 0.00E+00 7.49E-02 3.30E-04 L1: when 2.81E-01 1.82E-02 2.60E-02 8.33E-02 0.00E+00 2.08E-02 5.21E-03 L1: which 4.11E-02 0.00E+00 1.37E-02 5.48E-02 0.00E+00 5.48E-01 0.00E+00 <?page no="194"?> 194 L1: who 5.77E-02 2.40E-02 2.89E-02 1.15E-01 0.00E+00 4.42E-01 9.62E-03 L2: at 1.84E-01 6.20E-02 6.30E-04 2.12E-03 0.00E+00 6.35E-03 4.20E-04 L2: it 0.00E+00 9.29E-04 0.00E+00 0.00E+00 0.00E+00 5.51E-02 2.60E-02 L2: out 5.75E-03 0.00E+00 0.00E+00 1.15E-02 0.00E+00 0.00E+00 0.00E+00 L2: that 7.75E-02 3.07E-02 1.29E-02 9.05E-02 0.00E+00 5.27E-01 1.45E-02 R1: after 0.00E+00 4.69E-02 3.13E-02 1.56E-02 0.00E+00 3.13E-02 1.56E-02 R1: and 3.00E-02 1.67E-02 0.00E+00 1.00E-02 3.33E-03 2.53E-01 1.67E-02 R1: around 4.10E-03 0.00E+00 0.00E+00 4.10E-02 0.00E+00 1.64E-02 0.00E+00 R1: as 6.09E-02 2.15E-02 3.58E-03 6.81E-02 0.00E+00 3.94E-01 2.87E-02 R1: at 1.84E-01 6.20E-02 6.30E-04 2.12E-03 0.00E+00 6.35E-03 4.20E-04 R1: away 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 7.69E-03 0.00E+00 R1: back 9.57E-03 0.00E+00 0.00E+00 9.57E-03 0.00E+00 4.78E-03 0.00E+00 R1: down 4.23E-04 4.23E-04 0.00E+00 4.23E-04 0.00E+00 4.23E-04 0.00E+00 R1: for 2.31E-04 2.31E-04 7.71E-05 1.39E-01 7.71E-05 6.19E-04 0.00E+00 R1: forward 0.00E+00 3.00E-03 3.00E-03 0.00E+00 0.00E+00 0.00E+00 0.00E+00 R1: from 1.61E-02 0.00E+00 0.00E+00 0.00E+00 0.00E+00 1.61E-02 1.61E-02 R1: if 3.64E-01 4.14E-02 1.86E-02 1.18E-01 0.00E+00 9.94E-02 2.07E-03 R1: in 3.14E-02 8.97E-03 7.18E-02 2.24E-02 0.00E+00 3.14E-02 0.00E+00 R1: into 0.00E+00 1.59E-03 0.00E+00 0.00E+00 0.00E+00 9.45E-02 4.46E-02 R1: it 0.00E+00 0.00E+00 6.60E-04 0.00E+00 0.00E+00 9.07E-01 8.73E-02 R1: like 1.53E-02 8.50E-04 0.00E+00 2.55E-03 0.00E+00 5.95E-03 0.00E+00 R1: on 5.35E-04 0.00E+00 0.00E+00 1.07E-03 0.00E+00 0.00E+00 0.00E+00 R1: out 0.00E+00 0.00E+00 0.00E+00 6.25E-03 0.00E+00 1.25E-02 0.00E+00 R1: over 2.58E-02 1.02E-02 4.30E-03 3.01E-02 0.00E+00 1.75E-01 4.84E-03 R1: that 5.26E-03 2.63E-03 0.00E+00 0.00E+00 0.00E+00 7.90E-03 0.00E+00 R1: through 7.47E-03 2.87E-03 2.30E-03 2.30E-03 0.00E+00 2.18E-02 1.15E-03 R1: to 4.12E-04 0.00E+00 0.00E+00 8.23E-04 0.00E+00 1.65E-03 8.23E-04 R1: up 7.65E-02 1.06E-02 0.00E+00 1.98E-01 0.00E+00 5.99E-01 2.64E-03 R1: what 1.05E-03 0.00E+00 0.00E+00 1.05E-02 0.00E+00 4.19E-03 0.00E+00 R2: around 6.51E-02 2.30E-02 3.83E-03 7.28E-02 0.00E+00 3.52E-01 3.07E-02 R2: as 1.84E-01 6.20E-02 6.30E-04 2.12E-03 0.00E+00 6.35E-03 4.20E-04 R2: at 8.10E-04 8.10E-04 2.70E-04 4.85E-01 2.70E-04 2.17E-03 0.00E+00 R2: for 9.26E-03 0.00E+00 0.00E+00 0.00E+00 0.00E+00 9.26E-03 9.26E-03 R2: from 8.79E-02 1.10E-02 2.20E-02 4.40E-02 0.00E+00 5.38E-01 1.10E-02 R2: how 3.64E-01 4.14E-02 1.86E-02 1.18E-01 0.00E+00 9.94E-02 2.07E-03 R2: if 3.21E-02 9.17E-03 7.34E-02 2.29E-02 0.00E+00 3.21E-02 0.00E+00 R2: in 6.54E-03 0.00E+00 4.97E-01 3.27E-03 0.00E+00 9.80E-03 3.27E-03 R2: into 0.00E+00 0.00E+00 6.60E-04 0.00E+00 0.00E+00 9.07E-01 8.76E-02 R2: like 4.21E-02 2.34E-03 0.00E+00 7.01E-03 0.00E+00 1.64E-02 0.00E+00 R2: on 0.00E+00 0.00E+00 0.00E+00 6.41E-04 0.00E+00 1.28E-03 0.00E+00 R2: over 1.09E-02 1.09E-02 0.00E+00 0.00E+00 0.00E+00 2.17E-02 0.00E+00 R2: through 3.07E-02 1.18E-02 9.46E-03 9.46E-03 0.00E+00 6.38E-02 4.73E-03 R2: to 2.56E-03 0.00E+00 0.00E+00 5.13E-03 0.00E+00 1.03E-02 5.13E-03 R2: up 7.65E-02 1.06E-02 0.00E+00 1.98E-01 0.00E+00 5.99E-01 2.64E-03 R2: what 0.00E+00 4.69E-02 3.13E-02 1.56E-02 0.00E+00 3.13E-02 1.56E-02 Sub: after 3.00E-02 1.67E-02 0.00E+00 1.00E-02 3.33E-03 2.53E-01 1.67E-02 Sub: and 3.66E-02 1.29E-02 2.16E-03 3.45E-02 0.00E+00 1.47E-01 1.72E-02 Sub: as 6.12E-02 2.07E-02 2.10E-04 7.07E-04 0.00E+00 2.12E-03 1.40E-04 Sub: at 1.19E-01 1.98E-02 0.00E+00 6.93E-02 0.00E+00 3.56E-01 4.95E-02 Sub: but 1.80E-04 1.80E-04 6.00E-05 1.08E-01 6.00E-05 4.81E-04 0.00E+00 Sub: for 1.85E-02 0.00E+00 0.00E+00 0.00E+00 0.00E+00 1.85E-02 1.85E-02 Sub: from 8.79E-02 1.10E-02 2.20E-02 4.40E-02 0.00E+00 5.38E-01 1.10E-02 Sub: how 3.64E-01 4.14E-02 1.86E-02 1.18E-01 0.00E+00 9.94E-02 2.07E-03 Sub: if 7.11E-03 2.03E-03 1.62E-02 5.08E-03 0.00E+00 7.11E-03 0.00E+00 Sub: in 0.00E+00 0.00E+00 2.20E-04 0.00E+00 0.00E+00 3.02E-01 2.92E-02 Sub: like 7.75E-02 3.07E-02 1.29E-02 9.05E-02 0.00E+00 5.27E-01 1.45E-02 Sub: that 1.89E-02 6.29E-03 0.00E+00 1.89E-02 0.00E+00 1.26E-02 0.00E+00 Sub: then 5.44E-03 5.44E-03 0.00E+00 0.00E+00 0.00E+00 1.09E-02 0.00E+00 <?page no="195"?> 195 Sub: through 3.07E-02 1.18E-02 9.46E-03 9.46E-03 0.00E+00 6.38E-02 4.73E-03 Sub: to 7.63E-02 1.05E-02 0.00E+00 1.97E-01 0.00E+00 5.97E-01 2.63E-03 Sub: what 2.81E-01 1.82E-02 2.60E-02 8.33E-02 0.00E+00 2.08E-02 5.21E-03 Sub: when 4.35E-02 0.00E+00 1.45E-02 5.80E-02 0.00E+00 5.80E-01 0.00E+00 Sub: which 0.00E+00 0.00E+00 2.56E-03 1.28E-02 0.00E+00 7.67E-03 0.00E+00 Sub: while 2.88E-02 1.20E-02 1.44E-02 5.77E-02 0.00E+00 2.21E-01 4.81E-03 Sub: who 0.00E+00 4.69E-02 3.13E-02 1.56E-02 0.00E+00 3.13E-02 1.56E-02 SENSES 15 16 17 18 19 20 21 FEATURES L1: after 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 L1: and 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 L1: at 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 2.33E-05 0.00E+00 L1: but 0.00E+00 1.98E-02 0.00E+00 9.90E-03 0.00E+00 0.00E+00 0.00E+00 L1: by 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 L1: for 0.00E+00 1.08E-03 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 L1: forward 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 L1: he 0.00E+00 1.01E-02 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 L1: how 0.00E+00 1.83E-03 0.00E+00 0.00E+00 0.00E+00 9.16E-04 0.00E+00 L1: in 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 5.08E-02 0.00E+00 L1: it 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 L1: just 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 L1: like 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 L1: she 1.00E-02 1.00E-02 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 L1: that 0.00E+00 1.29E-02 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 L1: to 0.00E+00 2.77E-01 0.00E+00 4.73E-03 0.00E+00 0.00E+00 0.00E+00 L1: up 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 L1: what 0.00E+00 3.30E-04 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 L1: when 0.00E+00 2.60E-03 0.00E+00 2.60E-03 0.00E+00 2.60E-03 0.00E+00 L1: which 0.00E+00 1.37E-02 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 L1: who 0.00E+00 1.92E-02 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 L2: at 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 2.10E-04 0.00E+00 L2: it 0.00E+00 1.24E-03 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 L2: out 0.00E+00 0.00E+00 0.00E+00 5.75E-03 0.00E+00 0.00E+00 0.00E+00 L2: that 0.00E+00 1.29E-02 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 R1: after 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 R1: and 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 R1: around 0.00E+00 4.10E-03 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 R1: as 0.00E+00 7.17E-03 0.00E+00 3.58E-03 0.00E+00 1.08E-02 0.00E+00 R1: at 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 2.10E-04 0.00E+00 R1: away 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 R1: back 0.00E+00 4.78E-03 4.78E-03 0.00E+00 0.00E+00 0.00E+00 0.00E+00 R1: down 0.00E+00 0.00E+00 0.00E+00 4.23E-04 0.00E+00 0.00E+00 0.00E+00 R1: for 0.00E+00 1.54E-04 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 R1: forward 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 R1: from 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 R1: if 0.00E+00 0.00E+00 0.00E+00 6.21E-03 0.00E+00 4.14E-03 0.00E+00 R1: in 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 1.35E-01 0.00E+00 R1: into 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 R1: it 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 R1: like 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 8.50E-04 0.00E+00 R1: on 0.00E+00 0.00E+00 0.00E+00 5.35E-04 0.00E+00 0.00E+00 0.00E+00 R1: out 0.00E+00 0.00E+00 0.00E+00 0.00E+00 6.25E-03 0.00E+00 0.00E+00 R1: over 0.00E+00 4.30E-03 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 R1: that 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 R1: through 0.00E+00 6.72E-02 0.00E+00 1.15E-03 0.00E+00 0.00E+00 0.00E+00 R1: to 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 <?page no="196"?> 196 R1: up 0.00E+00 2.64E-03 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 R1: what 0.00E+00 1.05E-03 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 R2: around 0.00E+00 7.66E-03 0.00E+00 3.83E-03 0.00E+00 1.15E-02 0.00E+00 R2: as 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 2.10E-04 0.00E+00 R2: at 0.00E+00 5.40E-04 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 R2: for 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 R2: from 0.00E+00 2.20E-02 0.00E+00 0.00E+00 0.00E+00 1.10E-02 0.00E+00 R2: how 0.00E+00 0.00E+00 0.00E+00 6.21E-03 0.00E+00 4.14E-03 0.00E+00 R2: if 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 1.33E-01 0.00E+00 R2: in 0.00E+00 0.00E+00 0.00E+00 3.27E-03 0.00E+00 9.80E-03 0.00E+00 R2: into 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 R2: like 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 2.34E-03 0.00E+00 R2: on 0.00E+00 0.00E+00 0.00E+00 0.00E+00 6.41E-04 0.00E+00 0.00E+00 R2: over 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 R2: through 0.00E+00 2.77E-01 0.00E+00 4.73E-03 0.00E+00 0.00E+00 0.00E+00 R2: to 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 R2: up 0.00E+00 2.64E-03 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 R2: what 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 Sub: after 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 Sub: and 0.00E+00 4.31E-03 0.00E+00 2.16E-03 0.00E+00 6.47E-03 0.00E+00 Sub: as 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 7.00E-05 0.00E+00 Sub: at 0.00E+00 1.98E-02 0.00E+00 9.90E-03 0.00E+00 0.00E+00 0.00E+00 Sub: but 0.00E+00 1.20E-04 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 Sub: for 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 Sub: from 0.00E+00 2.20E-02 0.00E+00 0.00E+00 0.00E+00 1.10E-02 0.00E+00 Sub: how 0.00E+00 0.00E+00 0.00E+00 6.21E-03 0.00E+00 4.14E-03 0.00E+00 Sub: if 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 1.02E-02 0.00E+00 Sub: in 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 Sub: like 0.00E+00 1.29E-02 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 Sub: that 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 1.26E-02 0.00E+00 Sub: then 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 Sub: through 0.00E+00 2.77E-01 0.00E+00 4.73E-03 0.00E+00 0.00E+00 0.00E+00 Sub: to 0.00E+00 2.63E-03 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 Sub: what 0.00E+00 2.60E-03 0.00E+00 2.60E-03 0.00E+00 2.60E-03 0.00E+00 Sub: when 0.00E+00 1.45E-02 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 Sub: which 0.00E+00 2.56E-03 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 Sub: while 0.00E+00 9.62E-03 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 Sub: who 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 SENSES 22 23 24 25 26 27 28 FEATURES L1: after 0.00E+00 0.00E+00 5.78E-01 0.00E+00 0.00E+00 0.00E+00 0.00E+00 L1: and 0.00E+00 0.00E+00 0.00E+00 6.67E-03 0.00E+00 0.00E+00 6.67E-03 L1: at 0.00E+00 0.00E+00 0.00E+00 1.18E-04 0.00E+00 0.00E+00 2.33E-05 L1: but 0.00E+00 0.00E+00 9.90E-03 9.90E-03 0.00E+00 0.00E+00 1.98E-02 L1: by 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 L1: for 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 5.40E-04 L1: forward 6.10E-03 0.00E+00 6.10E-03 0.00E+00 0.00E+00 0.00E+00 4.51E-01 L1: he 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 L1: how 0.00E+00 0.00E+00 1.83E-03 0.00E+00 0.00E+00 0.00E+00 0.00E+00 L1: in 0.00E+00 0.00E+00 0.00E+00 1.02E-02 0.00E+00 0.00E+00 5.08E-03 L1: it 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 L1: just 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 L1: like 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 L1: she 0.00E+00 0.00E+00 0.00E+00 1.00E-02 0.00E+00 0.00E+00 0.00E+00 L1: that 1.62E-03 0.00E+00 3.23E-03 3.23E-03 0.00E+00 1.62E-03 8.08E-03 L1: to 3.55E-02 0.00E+00 0.00E+00 4.73E-03 0.00E+00 0.00E+00 1.70E-01 L1: up 0.00E+00 1.97E-04 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 <?page no="197"?> 197 L1: what 0.00E+00 0.00E+00 0.00E+00 3.30E-04 0.00E+00 0.00E+00 0.00E+00 L1: when 0.00E+00 0.00E+00 0.00E+00 4.69E-02 0.00E+00 0.00E+00 0.00E+00 L1: which 0.00E+00 0.00E+00 4.11E-02 1.37E-02 0.00E+00 1.37E-02 0.00E+00 L1: who 0.00E+00 0.00E+00 1.92E-02 0.00E+00 0.00E+00 4.81E-03 9.62E-03 L2: at 0.00E+00 0.00E+00 0.00E+00 1.06E-03 0.00E+00 0.00E+00 2.10E-04 L2: it 0.00E+00 0.00E+00 9.29E-04 0.00E+00 0.00E+00 0.00E+00 5.51E-02 L2: out 0.00E+00 5.75E-03 5.75E-03 0.00E+00 0.00E+00 0.00E+00 0.00E+00 L2: that 1.62E-03 0.00E+00 3.23E-03 3.23E-03 0.00E+00 1.62E-03 8.08E-03 R1: after 0.00E+00 0.00E+00 5.78E-01 0.00E+00 0.00E+00 0.00E+00 0.00E+00 R1: and 0.00E+00 0.00E+00 0.00E+00 6.67E-03 0.00E+00 0.00E+00 6.67E-03 R1: around 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 4.10E-03 R1: as 3.58E-03 0.00E+00 0.00E+00 1.08E-02 0.00E+00 0.00E+00 3.58E-03 R1: at 0.00E+00 0.00E+00 0.00E+00 1.06E-03 0.00E+00 0.00E+00 2.10E-04 R1: away 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 R1: back 0.00E+00 0.00E+00 0.00E+00 4.64E-01 2.87E-02 0.00E+00 0.00E+00 R1: down 4.23E-04 0.00E+00 0.00E+00 0.00E+00 0.00E+00 4.23E-03 0.00E+00 R1: for 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 7.71E-05 R1: forward 3.00E-03 0.00E+00 3.00E-03 0.00E+00 0.00E+00 0.00E+00 3.09E-01 R1: from 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 R1: if 2.07E-03 0.00E+00 4.14E-03 1.24E-02 0.00E+00 0.00E+00 2.07E-03 R1: in 0.00E+00 0.00E+00 0.00E+00 8.97E-03 0.00E+00 0.00E+00 4.48E-03 R1: into 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 R1: it 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 R1: like 0.00E+00 0.00E+00 0.00E+00 8.50E-03 0.00E+00 1.70E-03 0.00E+00 R1: on 0.00E+00 5.35E-04 5.35E-04 0.00E+00 0.00E+00 0.00E+00 0.00E+00 R1: out 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 R1: over 5.37E-04 0.00E+00 1.08E-03 1.08E-03 0.00E+00 5.37E-04 2.69E-03 R1: that 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 R1: through 8.62E-03 0.00E+00 0.00E+00 1.15E-03 0.00E+00 0.00E+00 4.14E-02 R1: to 0.00E+00 4.12E-04 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 R1: up 0.00E+00 0.00E+00 0.00E+00 2.64E-03 0.00E+00 0.00E+00 0.00E+00 R1: what 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 1.05E-03 R2: around 3.83E-03 0.00E+00 0.00E+00 1.15E-02 0.00E+00 0.00E+00 3.83E-03 R2: as 0.00E+00 0.00E+00 0.00E+00 1.06E-03 0.00E+00 0.00E+00 2.10E-04 R2: at 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 2.70E-04 R2: for 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 R2: from 0.00E+00 0.00E+00 2.20E-02 0.00E+00 0.00E+00 0.00E+00 0.00E+00 R2: how 2.07E-03 0.00E+00 4.14E-03 1.24E-02 0.00E+00 0.00E+00 2.07E-03 R2: if 0.00E+00 0.00E+00 0.00E+00 9.17E-03 0.00E+00 0.00E+00 4.59E-03 R2: in 6.54E-03 0.00E+00 3.27E-03 0.00E+00 0.00E+00 0.00E+00 0.00E+00 R2: into 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 R2: like 0.00E+00 0.00E+00 0.00E+00 2.34E-02 0.00E+00 4.67E-03 0.00E+00 R2: on 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 R2: over 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 R2: through 3.55E-02 0.00E+00 0.00E+00 4.73E-03 0.00E+00 0.00E+00 1.70E-01 R2: to 0.00E+00 2.56E-03 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 R2: up 0.00E+00 0.00E+00 0.00E+00 2.64E-03 0.00E+00 0.00E+00 0.00E+00 R2: what 0.00E+00 0.00E+00 5.78E-01 0.00E+00 0.00E+00 0.00E+00 0.00E+00 Sub: after 0.00E+00 0.00E+00 0.00E+00 6.67E-03 0.00E+00 0.00E+00 6.67E-03 Sub: and 2.16E-03 0.00E+00 0.00E+00 6.47E-03 0.00E+00 0.00E+00 2.16E-03 Sub: as 0.00E+00 0.00E+00 0.00E+00 3.53E-04 0.00E+00 0.00E+00 7.00E-05 Sub: at 0.00E+00 0.00E+00 9.90E-03 9.90E-03 0.00E+00 0.00E+00 1.98E-02 Sub: but 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 6.00E-05 Sub: for 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 Sub: from 0.00E+00 0.00E+00 2.20E-02 0.00E+00 0.00E+00 0.00E+00 0.00E+00 Sub: how 2.07E-03 0.00E+00 4.14E-03 1.24E-02 0.00E+00 0.00E+00 2.07E-03 Sub: if 0.00E+00 0.00E+00 0.00E+00 2.03E-03 0.00E+00 0.00E+00 1.02E-03 Sub: in 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 <?page no="198"?> 198 Sub: like 1.62E-03 0.00E+00 3.23E-03 3.23E-03 0.00E+00 1.62E-03 8.08E-03 Sub: that 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 6.29E-03 Sub: then 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 Sub: through 3.55E-02 0.00E+00 0.00E+00 4.73E-03 0.00E+00 0.00E+00 1.70E-01 Sub: to 0.00E+00 0.00E+00 0.00E+00 2.63E-03 0.00E+00 0.00E+00 0.00E+00 Sub: what 0.00E+00 0.00E+00 0.00E+00 4.69E-02 0.00E+00 0.00E+00 0.00E+00 Sub: when 0.00E+00 0.00E+00 0.00E+00 1.45E-02 0.00E+00 0.00E+00 0.00E+00 Sub: which 0.00E+00 0.00E+00 1.28E-03 0.00E+00 0.00E+00 1.28E-03 0.00E+00 Sub: while 0.00E+00 0.00E+00 9.62E-03 0.00E+00 0.00E+00 2.41E-03 4.81E-03 Sub: who 0.00E+00 0.00E+00 5.78E-01 0.00E+00 0.00E+00 0.00E+00 0.00E+00 SENSES 29 30 31 32 33 34 35 FEATURES L1: after 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 L1: and 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 L1: at 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 L1: but 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 L1: by 0.00E+00 0.00E+00 5.38E-03 0.00E+00 0.00E+00 0.00E+00 0.00E+00 L1: for 0.00E+00 0.00E+00 5.41E-03 0.00E+00 0.00E+00 0.00E+00 0.00E+00 L1: forward 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 L1: he 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 L1: how 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 9.16E-04 L1: in 5.08E-03 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 L1: it 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 L1: just 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 L1: like 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 L1: she 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 L1: that 0.00E+00 0.00E+00 1.62E-03 0.00E+00 0.00E+00 0.00E+00 1.62E-03 L1: to 0.00E+00 0.00E+00 2.36E-03 0.00E+00 0.00E+00 0.00E+00 2.36E-02 L1: up 0.00E+00 0.00E+00 0.00E+00 3.95E-04 0.00E+00 4.34E-03 1.97E-03 L1: what 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 L1: when 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 2.60E-03 0.00E+00 L1: which 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 L1: who 0.00E+00 9.62E-03 0.00E+00 0.00E+00 0.00E+00 9.62E-03 9.62E-03 L2: at 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 L2: it 2.60E-02 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 L2: out 0.00E+00 0.00E+00 1.26E-01 0.00E+00 0.00E+00 0.00E+00 0.00E+00 L2: that 0.00E+00 0.00E+00 1.62E-03 0.00E+00 0.00E+00 0.00E+00 1.62E-03 R1: after 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 R1: and 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 R1: around 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 R1: as 0.00E+00 1.43E-02 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 R1: at 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 R1: away 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 R1: back 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 R1: down 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 R1: for 0.00E+00 0.00E+00 7.73E-04 0.00E+00 0.00E+00 0.00E+00 0.00E+00 R1: forward 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 R1: from 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 R1: if 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 2.07E-03 0.00E+00 R1: in 4.48E-03 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 R1: into 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 R1: it 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 R1: like 0.00E+00 1.53E-02 0.00E+00 0.00E+00 0.00E+00 1.70E-03 0.00E+00 R1: on 0.00E+00 0.00E+00 1.39E-02 0.00E+00 0.00E+00 0.00E+00 0.00E+00 R1: out 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 R1: over 0.00E+00 0.00E+00 5.37E-04 0.00E+00 0.00E+00 0.00E+00 5.37E-04 R1: that 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 <?page no="199"?> 199 R1: through 0.00E+00 0.00E+00 5.75E-04 0.00E+00 0.00E+00 0.00E+00 5.75E-03 R1: to 0.00E+00 0.00E+00 0.00E+00 8.23E-04 0.00E+00 9.05E-03 5.76E-03 R1: up 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 R1: what 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 R2: around 0.00E+00 1.53E-02 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 R2: as 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 R2: at 0.00E+00 0.00E+00 2.71E-03 0.00E+00 0.00E+00 0.00E+00 0.00E+00 R2: for 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 R2: from 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 1.10E-02 R2: how 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 2.07E-03 0.00E+00 R2: if 4.59E-03 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 R2: in 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 R2: into 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 R2: like 0.00E+00 4.21E-02 0.00E+00 0.00E+00 0.00E+00 4.67E-03 0.00E+00 R2: on 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 R2: over 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 R2: through 0.00E+00 0.00E+00 2.36E-03 0.00E+00 0.00E+00 0.00E+00 2.36E-02 R2: to 0.00E+00 0.00E+00 0.00E+00 5.13E-03 0.00E+00 5.64E-02 2.56E-02 R2: up 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 R2: what 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 Sub: after 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 Sub: and 0.00E+00 8.62E-03 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 Sub: as 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 Sub: at 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 Sub: but 0.00E+00 0.00E+00 6.01E-04 0.00E+00 0.00E+00 0.00E+00 0.00E+00 Sub: for 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 Sub: from 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 1.10E-02 Sub: how 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 2.07E-03 0.00E+00 Sub: if 1.02E-03 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 Sub: in 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 Sub: like 0.00E+00 0.00E+00 1.62E-03 0.00E+00 0.00E+00 0.00E+00 1.62E-03 Sub: that 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 Sub: then 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 Sub: through 0.00E+00 0.00E+00 2.36E-03 0.00E+00 0.00E+00 0.00E+00 2.36E-02 Sub: to 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 Sub: what 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 2.60E-03 0.00E+00 Sub: when 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 Sub: which 0.00E+00 2.56E-03 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 Sub: while 0.00E+00 4.81E-03 0.00E+00 0.00E+00 0.00E+00 4.81E-03 4.81E-03 Sub: who 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 0.00E+00 <?page no="201"?> The book is a clear yet challenging overview of both the theoretical and the practical aspects of doing semantic research through the use of language corpora. Via a very hands-on approach it presents the relevant semantic and corpus linguistic issues in an accessible way and aims at providing a very practical experience of doing advanced corpus-based research.