eJournals Forum Exegese und Hochschuldidaktik: Verstehen von Anfang an (VvAa)8/2

Forum Exegese und Hochschuldidaktik: Verstehen von Anfang an (VvAa)
vvaa
2366-0597
2941-0789
Francke Verlag Tübingen
10.24053/VvAa-2023-0014
Es handelt sich um einen Open-Access-Artikel, der unter den Bedingungen der Lizenz CC by 4.0 veröffentlicht wurde.http://creativecommons.org/licenses/by/4.0/vvaa82/vvaa82.pdf0330
2026
82 Fischer Heilmann Wagner Köhlmoos

Between Ancient Texts and Large Language Models

0330
2026
Christoph Heilighttps://orcid.org/0000-0002-9397-5629
The public release of ChatGPT in November 2022 marked a watershed moment for teaching in the humanities, as Large Language Models (LLMs) suddenly demonstrated text generation capabilities indistinguishable from human writing. Using biblical studies as a case study, this paper examines how these tools challenge core pedagogical practices across text-based disciplines. The article offers a look into the ‘black box’ of LLMs through hands-on demonstrations with OpenAI’s playground, revealing their nature as probabilistic systems where the next word is selected from weighted possibilities rather than retrieved deterministically. Understanding this proves crucial for students and educators, explaining phenomena like ‘hallucinations’ and exposing why treating LLMs as citable sources fundamentally misunderstands what they are. Having established how LLMs function, the article examines three areas where these tools challenge traditional teaching in biblical scholarship – and by extension, the humanities: increasingly sophisticated AI translations question the value of intensive language training; LLMs that process entire books in seconds make existing reading practices appear inefficient; and AI-generated content increasingly rivals scholarly writing. Rather than dismissing LLMs as “stochastic parrots” or embracing them uncritically, the article proposes a balanced response preserving essential practices – particularly slow reading and writing as critical thinking tools – while leveraging LLMs as dialogue partners and research assistants. The humanities’ future vitality depends on articulating why human engagement with texts remains irreplaceable, even as these tools are integrated into educational frameworks. Biblical studies thus offers a compelling test case for how humanities disciplines must reimagine their teaching – not through isolated policy adjustments but through fundamental reconsideration of core competencies in an AItransformed landscape.
vvaa820025
1 This essay is part of my project “Theology between Scriptures and AI-Generated Texts,” for which I received a scholarship by the Bavarian Academy of Sciences and Humanities. Between Ancient Texts and Large Language Models 1 The Future of Pedagogy in Biblical Studies Christoph Heilig (orcid.org/ 0000-0002-9397-5629) The public release of ChatGPT in November 2022 marked a watershed moment for teaching in the humanities, as Large Language Models (LLMs) suddenly demonstrated text generation capabilities indistinguishable from human writing. Using biblical studies as a case study, this paper examines how these tools challenge core pedagogical practices across text-based disciplines. The article offers a look into the ‘black box’ of LLMs through hands-on demonstrations with OpenAI’s playground, revealing their na‐ ture as probabilistic systems where the next word is selected from weighted possibilities rather than retrieved deterministically. Understanding this proves crucial for students and educators, explaining phenomena like ‘hallucinations’ and exposing why treating LLMs as citable sources funda‐ mentally misunderstands what they are. Having established how LLMs function, the article examines three areas where these tools challenge tra‐ ditional teaching in biblical scholarship - and by extension, the humanities: increasingly sophisticated AI translations question the value of intensive language training; LLMs that process entire books in seconds make exist‐ ing reading practices appear inefficient; and AI-generated content increas‐ ingly rivals scholarly writing. Rather than dismissing LLMs as “stochastic parrots” or embracing them uncritically, the article proposes a balanced response preserving essential practices - particularly slow reading and writing as critical thinking tools - while leveraging LLMs as dialogue partners and research assistants. The humanities’ future vitality depends on articulating why human engagement with texts remains irreplaceable, DOI 10.24053/ VvAa-2023-0014 even as these tools are integrated into educational frameworks. Biblical studies thus offers a compelling test case for how humanities disciplines must reimagine their teaching - not through isolated policy adjustments but through fundamental reconsideration of core competencies in an AItransformed landscape. Als ChatGPT im November 2022 öffentlich zugänglich gemacht wurde, markierte dies einen Wendepunkt für die geisteswissenschaftliche Päda‐ gogik - plötzlich konnten große Sprachmodelle (englisch „Large Language Models,“ kurz: LLMs) Texte erzeugen, die sich nicht mehr von menschli‐ chen Produkten unterscheiden ließen. Am Beispiel der Bibelwissenschaft untersucht dieser Beitrag, wie diese Werkzeuge zentrale Lehrpraktiken textbasierter Disziplinen herausfordern. Der Artikel bietet zunächst einen Blick in die ‘Black Box’ LLMs, indem er durch praktische Demonstrationen im Playground von OpenAI zeigt, dass sie als probabilistische Systeme funktionieren, bei denen das nächste Wort aus gewichteten Möglichkeiten ausgewählt wird - statt deterministisch abgerufen zu werden. Dieses Verständnis ist entscheidend für Studierende und Lehrende: Es erklärt, woher Phänomene wie „Halluzinationen“ kommen, und zeigt, warum es grundfalsch ist, LLMs als zitierfähige Quellen zu behandeln. Nachdem geklärt ist, wie LLMs funktionieren, untersucht der Artikel drei Bereiche, in denen diese Werkzeuge die traditionelle Pädagogik in der Bibelwissen‐ schaft - und damit auch in anderen textbasierten Geisteswissenschaften - herausfordern: beim Spracherwerb stellen KI-Übersetzungen den Sinn intensiver Sprachausbildung in Frage; bei Lesepraktiken verändern LLMs, die ganze Bücher verarbeiten können, die Art der Textrezeption; und bei Schreibkompetenzen konkurriert KI-generierter Inhalt zunehmend mit wissenschaftlichen Texten. Statt LLMs als „stochastische Papageien“ ab‐ zutun oder sie unkritisch zu übernehmen, plädiert der Artikel für eine ausgewogene Reaktion: Bewährte Praktiken - vor allem langsames Lesen und Schreiben als Instrumente kritischen Denkens - sollten erhalten bleiben, während LLMs als Dialogpartner und Forschungsassistenten genutzt werden. Die Zukunft der Geisteswissenschaften hängt davon ab, überzeugend darzulegen, warum die menschliche Auseinandersetzung mit Texten unersetzlich bleibt - auch wenn diese neuen Werkzeuge in die Lehre integriert werden. Die Bibelwissenschaft zeigt exemplarisch, wie geisteswissenschaftliche Fächer ihre Pädagogik neu denken müssen: nicht durch vereinzelte Regelanpassungen, sondern durch eine grundlegende Neubestimmung ihrer Kernkompetenzen in einer von KI geprägten Welt. DOI 10.24053/ VvAa-2023-0014 26 Christoph Heilig 2 Brown et al., Models. 3 Some theorists would, of course, deny that the products of LLMs can be called ‘texts’ at all. For example, Schneider, Texturen, 15, prefers the label “intelligible textures” - because the LLMs lack power of judgement but their products can be read and interpreted as texts. 4 The recent preprint by Grace et al., Authors, demonstrates this. As their figure 3 impressively shows, within a single year (2022 to 2023) experts in the field have updated their predictions concerning when various AI milestones will be reached for the first time significantly. 1 Purpose and Scope of this Article Until November 2022, the topic of ‘artificial intelligence’ (AI) was hardly something that was high on the agenda of most theologians - let alone Biblical scholars. Within theology as a whole, it was a niche topic for systematic theologians investigating transhumanism and ethicists considering the impact in specific areas of application, such as automated medical diagnoses. Then, however, OpenAI made ChatGPT available to the public and allowed users in an unprecedented PR campaign to chat with their ‘large language model’ (LLM) - a transformer (a special type of neural network, hence the ‘T’) that was pretrained on hundreds of billions of words from the internet (hence the ‘P’) 2 and capable of generating something (hence the ‘G’) - namely text  3 … most notably, text that - for the first time, due to the specific training process involved - to many users seemed to be indistinguishable from text that human authors are capable of producing. This stood in stark contrast with the annoying chatbots most people were used to up to this point, from frustrating experiences with customer services. Accordingly, ever since that watershed moment we arguably live in a new era in which, for the first time in decades (after what turned out to be way too much optimism in the beginning), for many specialists in the field and for many outside observers, the original goal of AI research - to create a system capable of competing with humans on a wide range of cognitive tasks (nowadays called ‘artificial general intelligence’, AGI) - suddenly seems within reach again. 4 This, naturally, also raises the question of how higher education will develop in this new time and, hence, what this will mean for theology as an academic discipline, in particular for the branches of Biblical studies, which deal with ancient texts - their history and meaning. What is, in other words, with an eye toward the topic of this volume, the role of academic writing on the Bible, particularly in the context of higher education, in this ‘age of AI’? DOI 10.24053/ VvAa-2023-0014 Between Ancient Texts and Large Language Models 27 5 See Heilig, Reformation, for a short overview over my thoughts on the subject. 6 For example, it seems telling that among the different AI milestones that are mentioned in Grace et al., Authors, the creation of a NYT bestseller by an AI seems to have gained the greatest boost in plausibility over just one year - in the eyes of AI experts, who arguably are not the most qualified people when it comes to the quality of literature. Over the last one and a half years, I have had many opportunities to speak about this topic. 5 This has turned out to be very beneficial for my own thinking about the subject matter. After all, regularly revisiting the current state of affairs has sensitized me to two interrelated dynamics: On the one hand, it has helped me to avoid ‘moving the goalposts’ myself - and to notice when this occurs in debates about the future of theology and Biblical studies in higher education. What exactly constitutes the threshold that should make us worried about the future of our discipline and should prompt us to enter a careful and critical selfexamination to decide how we want to enter into this period? I remember times - as distant as they might already seem, this was just a couple of months ago - when it was common to hear objections such as “ChatGPT does not know Greek” or “ChatGPT does not have access to the internet.” While at one point some people ridiculed the idea of students writing short essays with the help of LLMs, the new standard (I am writing this in March 2024) seems suddenly to have become that LLMs cannot yet, after all, write entire Master’s theses … On the other hand, regularly comparing one’s predictions to the actual developments that ensued also helps one to become more sensitive to hype - realizing the bandwagons that one might have jumped on prematurely. Since the potential implications of this new technology are so far-reaching, it is rare for any individual to have expertise in all the relevant areas that are affected by the suspected changes. Accordingly, no one is immune to underestimating certain challenges that LLMs might face or to overestimating their capabilities. 6 Marrying LLMs and the internet, the integration of plugins into LLMs, and the emergence of autonomous AI agents are just the most wellknown developments that have not lived up to the hype around them. I write this article against the backdrop of this still very short history of LLMs since the release of ChatGPT in November 2022. It seems to me that certain trajectories concerning the technical development are discernible at this point in time without unduly entering the realm of speculation. However, we do need to be careful not to build calls for transformations of important traditional pillars of pedagogy in higher education on shaky ground - that is still being laid. Therefore, I will in what follows mostly abstain from pointing to very specific recent developments on the one hand or only potential, future developments on the other hand - knowing that the former comments will most certainly already DOI 10.24053/ VvAa-2023-0014 28 Christoph Heilig 7 Heilig and Heilig, Methodology. See also https: / / theologie.unibas.ch/ en/ departments/ n ew-testament/ bayes-and-bible/ . 8 See, in particular, Heilig, Erzähler. be outdated at the time of publication of this article and the latter predictions might already be proven wrong when you read this. I will, hence, stick to very basic considerations concerning what LLMs are and what they are not (at least not yet - and, I would venture to guess, also not yet at the time when this article gets published) and how this might interact with some of the most basic tenets of the pedagogy of Biblical (and theological) studies - and, at least in part, the humanities in general. Of course, there are many much more specific considerations that are in order. But I hope that even this overview will turn out to be stimulating to many readers. And perhaps this more modest approach will allow it to remain relevant for some time. We will see. Pretty soon. 2 Beyond (Im)Practical Advice I come to this question of the impact of AI on the field of theology (and the humanities in general) as a New Testament scholar who is wondering about the future of his own field of research and teaching. This is an admittedly very limited perspective. I also address this nexus of questions, however, from the specific vantage point of somebody who has been following developments in the area of LLMs for some time, due to my research foci on probability theory on the one hand 7 and text linguistics on the other hand. 8 These rather obscure research interests, thus, have ultimately served me well, preparing me for the shock induced by ChatGPT. After all, at the most basic level, LLMs are neural networks trained on large text corpora to estimate the conditional probability distribution over the next token given the preceding context. A decoding procedure then uses this distribution to produce coherent text. It is this basic characteristic of LLMs that in my view should be foundational to any dialogue about how this new technology can, will, and should impact the didactics of Biblical exegesis. Unfortunately, there seems to be a trend in higher education to make up for the long-standing failure to prepare for the challenge that LLMs were ultimately to pose by now imposing a variety of rather specific regulations, which are tailored to very specific areas of application in academic writing. While it is understandable that those who teach at the university wish to receive some guidance with respect to the degree to which their students are allowed to use LLMs in their assignments, bypassing this foundational level DOI 10.24053/ VvAa-2023-0014 Between Ancient Texts and Large Language Models 29 9 MLA Style Center, AI. 10 Universität Basel, KI. 11 OpenAI, Technical Report. naturally leads to guidelines that not only do not do justice to the phenomenon - but even come with very detrimental effects on how students conceptualize LLMs. As a point of departure, take, for example, the way the influential MLA Style Center recommends “cit[ing] AI”. 9 While they do not “recommend treating the AI tool as an author”, various universities have applied the general recommen‐ dations in a way that evokes just that impression. For example, the University of Basel to this day (March 2024; still true at the time of preparing this essay for publication, March 2025! ) has a guideline on its website that is about “how to cite from AI” (German: “Aus KI zitieren”). 10 This is far from just a practical help for students. Even the title comes with heavy presuppositions, implying that AI is something like a source, similar perhaps to Wikipedia, a realm from which we could draw a certain piece of information in a controllable, repeatable manner. This idea also stands behind the very practical suggestion of how a student should cite text from LLMs. For example, according to this guideline, a student would have to cite a definition of “theology” provided by ChatGPT as follows: Theology can be defined as “the study of religious beliefs, practices, and experiences, focusing on the nature of the divine and its relationship with the world“ (“What is theology? ”, output of ChatGPT, March 25, 2024). Such practices will, of course, reinforce the impression that ChatGPT is some‐ thing like a user-friendly database. If we encourage students to follow such practices, we should not be surprised if this influences how they perceive LLMs and if this, in turn, shapes the way that they employ them in the generation of texts for their studies! 3 Playing on the Playground There is, thus, no more basic pedagogical need than to give students a solid idea of what an LLM actually is in the first place. A very simple first step toward that goal is to be more specific when referring to ‘AI’. It is true that the incorporation of the capability of processing images that came with the introduction of GPT-4 11 indeed was revolutionary in that it allowed a chatbot like ChatGPT to become multi-modal. Other innovations, such as the opportunity to speak with ChatGPT in the app version due to the integration of the transcription tool Whisper, are DOI 10.24053/ VvAa-2023-0014 30 Christoph Heilig 12 Bubeck et al., Sparks. 13 https: / / platform.openai.com/ playground? mode=complete. Update: At the time of pub‐ lication, this function has unfortunately been discontinued. Throug accessing GPT via the OpenAI API (with logprobs enabled for token probabilities) or web-based tools like TokenProbe (tokenprobe.cs.columbia.edu; which uses Llama 3.1 models), one can still achieve a comparable effect. 14 This tool shows how text is divided into tokens: https: / / platform.openai.com/ tokenizer. likewise true breakthroughs and must not be downplayed. However, the claim of GPT-4 showing “sparks” of AGI 12 has certainly turned out to be quite optimistic. Moreover, other innovations - such as giving an LLM access to the internet, the integration of plugins into ChatGPT, and the development of autonomous AI agents - have not had the kinds of effects that the hype around their introduction might have suggested. LLMs, thus, still are most adequately understood as such - namely neural networks trained on vast text corpora to estimate conditional probability distributions over the next token, enabling them to produce coherent and contextually plausible text that appears human-like to readers. Imprecise talk about ‘AI’ only detracts from that fact and might hinder students from realizing this basic nature of LLMs. To my mind, still the best tool for students (and their teachers) to understand what this means in practice is to make use of the legacy ‘completion’ function in the OpenAI playground. 13 Here, various settings allow for an adjustment of the output of the GPT model that produces the text (a version of GPT 3.5). Moreover, and most helpfully, it is possible to turn on the switch for “show probabilities”. Enabling this option will result in the produced text being marked by various color shadings, indicating the probabilities of the respective tokens. The following figure demonstrates how the text that is produced in reaction to a prompt that asks for a definition of “theology” might look like: Fig. 1: ChatGPT results with colored markings indicating the probabilities Hovering over the text will show us the probabilities for each ‘token’ - basically each element of the text for which GPT had to make a choice. (This can be a whole word but is usually only a combination of several letters and also includes spaces and punctuation.) 14 There is a lot of green in this text, DOI 10.24053/ VvAa-2023-0014 Between Ancient Texts and Large Language Models 31 indicating the high probability of the tokens that were chosen in the process of text production. Note that even though in the training data this precise question most probably did not occur many times, this question shares similarities with many other texts found in this corpus. On this basis, GPT 3.5 is able to predict that after a question such as “What is theology? ”, this is usually followed by a blank line and then the word that is inquired about in first position. This pattern of probability also dictates that there is an almost 99% chance of “theology” being followed by the word “is”. However, after “Theology is the study of”, the LLM for the first time has different plausible options to choose from: “the” (51%), “religious” (32%), “God” (8.5%), “religion” (7.9%). (Any concrete percentages you may see in a UI are illustrative for that specific model, prompt, and settings.) Even just a little experiment like this will demonstrate to students an important implication that stems from the very nature of LLMs. The texts that chatbots that make use of LLMs produce (assuming a sampling decoder is used) are not deterministic but involve an element of chance. (Pressing “regenerate” in ChatGPT can also illustrate this to some extent.) To be sure, the appropriate metaphor for an LLM is therefore not the one of a monkey randomly ‘writing’ on a typewriter. After all, the probabilities of the different tokens are far from equal. When an LLM produces a text, this is more like rolling a die - a weighted die! - at each position where the text could take a different turn. In other words, if we do not further influence the (i.e., this specific) LLM (note that the “temperature” is 1; more on that in a moment) and we let it produce a great number of texts after the prompt “What is theology? ” that start with “Theology is the study of ”, 51% of these texts will end up using “the” as the next word, 32% “religious”, 8.5% “God”, 7.9% “religion” - and in the remaining less than 1% of text exemplars we will get something a bit more surprising (such as “religions”, “concepts”, “beliefs”, “divine”, “belief ”, etc.). Even on this basis it is already clear why ‘citing AI’ as suggested in the guideline mentioned above is misleading - in fact, totally absurd. The LLM does not access a database from which it could extract a single ‘correct’ answer. Rather, it composes an answer by decoding from its learned distri‐ bution, and given the settings usually used in chatbots such as ChatGPT the same prompt can yield different phrasings across runs (and certainly across model updates). This becomes even clearer if we take a look at the different settings that we can adjust. I want to mention only the most important one here: The DOI 10.24053/ VvAa-2023-0014 32 Christoph Heilig 15 Other options (frequency penalty and presence penalty) that can serve the same goal have to do with avoiding repetition. ‘temperature’ has a huge impact on the kind of text that is produced. To extend the metaphor, it has to do with how ‘hot’ we allow the machine to run. If we choose zero, the most probable tokens will be preferred in the selection process (greedy decoding), i.e., the generation of text will indeed be determinis‐ tic. Accordingly, the resulting text will be very predictable - and, thus, boring. Figure 2 demonstrates this. The green highlighting indicates that the tokens in question were the clear favorites at these positions in the text. Fig. 2: ChatGPT result with ‘Temperature’ set to 0 If we turn the temperature up, this is equivalent, so to speak, to reshaping our biased die so that its less-probable sides land face-up more frequently. For example (see figure 3), if we turn up the temperature to 1.5, the predictable “theology” is followed by the rather improbable “can” (0.99%). Fig. 3: ChatGPT result with ‘Temperature’ set to 1.5 The different shadings of red visualize these more improbable - unexpected - choices. Such texts can seem less formulaic - and at times quite creative! (Wellinformed students of course already make use of this circumstance in order to hide the fact that they use LLMs in their assignments …) 15 However, if we turn up the temperature even more, we quickly enter a realm where we are faced with nonsensical sequences of tokens (that no longer deserve the designation “text”), as shown in figure 4. (The precise threshold is modeland prompt-dependent.) DOI 10.24053/ VvAa-2023-0014 Between Ancient Texts and Large Language Models 33 16 In order to still get an intelligible text, we would have to set the value for “Top P” to a lower level. For example, setting it to 0.5 would mean that the least likely 50% of options that would theoretically be possible would be excluded. In our analogy with biased dice, this would mean to first deforming the die so that less likely options become more probable results (temperature) - but then erasing those surfaces that are least likely before actually throwing the die. 17 Markschies, Intelligenz. Fig. 4: ChatGPT result with “Temperature” set to 1.75 While the LLM begins with the rather conventional phrase “Theology is the study of-…”, it then quickly goes off the rails. 16 In my view, it is very helpful to have students observe this transition of the LLM as a generator of coherent text to it producing just gibberish at high temperatures. Among other things, it helps them to understand how it is possible that an otherwise so remarkable dialogue partner at times simply seems to ‘make up’ stuff - including quotes and sources from the secondary literature that do not exist. After all, given that quotation marks are tokens that at any point of the text are potential though mostly unlikely tokens, it is understandable that with certain settings and in certain contexts, they can, of course, appear in a text generated by an LLM. For example, after an initial text such as “Theology can be defined as”, there is a 10% chance of an opening quotation mark (for the model that we are looking at here). It is entirely possible that it is chosen in the generation process within ChatGPT (which seems to be using temperatures around 0.7-0.8). And in this new context, a closing quotation mark at some point further down in the text is something that the LLM can indeed expect from the pattern extracted from the training data. And that this, in turn, is followed by tokens that resemble human names is also something that is entirely expected on this basis. Looking at this phenomenon from this angle, ‘hallucinations’ are not at all surprising. Helping scholars and students of Biblical and early Christian texts develop this perspective is helpful in that it prevents approaching LLMs with unrealistic expectations. Note, for example, that Christoph Markschies 17 ties his DOI 10.24053/ VvAa-2023-0014 34 Christoph Heilig rather optimistic prediction concerning how theology as an academic discipline will do in the face of LLMs by pointing out how unreliable ChatGPT was in compiling his CV. However, the appearance of “Heidelberg” in a theologian’s educational biography is not at all surprising. After all, GPT will only have ‘read’ very few texts about this specific individual during its training, but will ‘know’ that the city often occurs in conjunction with theological training. What actually should surprise us - and, in fact, did surprise many experts - is that current generations of LLMs are capable of producing, at times (if not often), text that actually coheres well with reality as we perceive it. After all, LLMs do not ‘understand’ parameters of the real world such as causality. They have not experienced gravity as we in our embodied existence did in the process of growing up, discovering the world, and acquiring language. Still, they are remarkably reliable when asked about how certain objects would behave under specific physical conditions. And even though they, in principle, do nothing but guess the next element of the text based on frequencies in their training data, they can by now also produce text about theological topics that in many cases rivals what students, and sometimes even scholars, would be capable of writing on these questions. That - with the right training and the right settings - we can cross over from nonsensical word salad into coherent, useful text at all deserves our amazement and suggests that large neural networks learn internal representations that correlate with real-world structure. We would be well-advised to keep the fact that this was a surprising development even for many experts in the field in the forefront of our minds as we will be increasingly bombarded with LLM-generated texts and take them for granted as parts of our daily lives. Approaching LLMs with this attitude will come with a twofold implication. First, it will caution us and our students against naively relying on LLMs as a kind of all-knowing dialogue partner, a quotable resource, a monolithic database. To be fair, this does not mean that LLMs should play no role at all in the research processes of students. Merging generative AI and internet access in a productive and reliable way will undoubtedly be a cornerstone of future search engine development. (At the time of the revision of this essay, in March 2025, this trend is well under way.) And, in fact, even without searching the internet LLMs might justifiably be used as sources of factual information - as long as potential sources of errors are adequately taken into account. After all, there is likewise nothing inherently infallible in running a Google search and relying on the information that is preselected at the top of the results by the algorithm. Analogously, just because there is a stochastic element to the responses of LLMs this does not mean that they are all equally unreliable. In the example provided DOI 10.24053/ VvAa-2023-0014 Between Ancient Texts and Large Language Models 35 in figure 5, at T=0, students would always get the correct answer from GPT-3.5. At T=1 (and no other settings that would influence the choice of tokens), still more than 98% of inquiries would receive the right response. On average, only 1 in 1000 prompts of this kind would directly result in the wrong answer “26”. Fig. 5: ChatGPT3.5 showing result probabilities However, depending on the frequency and consistency with which the correct information occurs in the training material, responses of an LLM might not meet standards required for academic studies. For example, as figure 6 demonstrates, when asked about the oldest canonical Gospel, (the now outdated) GPT-3.5 shows a tendency that does not match the consensus in the scholarly literature. Fig. 6: ChatGPT3.5 showing questionable probabilities Helping students to achieve an intuitive grasp of these dynamics is far better than either lulling them into a false sense of security by telling them how to ‘cite’ LLMs or alternatively prohibiting them from consulting them at all. DOI 10.24053/ VvAa-2023-0014 36 Christoph Heilig 18 Unfortunately, one still encounters supposed counter-examples that allegedly demon‐ strate the low quality of texts produced by LLMs that make use of the old GPT-3.5 version. Of course, any such tests should use the GPT-4 Version (cf. OpenAI, Technical Report) or similar LLMs, such as Claude-3.5 from Anthropic. 19 Bender et al., Parrot. 20 Hubert et al., State. 21 If, in fact, such text from the training data is reproduced verbatim, this can result in cases of copyright infringement. There are currently several active cases of such alleged happenings before the courts. Second, understanding the probabilistic mechanism underlying LLMs can help us as scholars and teachers appreciate just how astonishing the increase in quality of LLM-generated texts over the last couple of years has been. And this, in turn, can prevent us from downplaying the significance of this technology for academia. 18 The word strings produced by LLMs come shockingly close to texts produced by experts. The obvious limitations of LLMs do not diminish this recognition. Rather, the fact that LLMs work so well despite only using patterns of probability at their basis should make us even more humble. It is against the backdrop of this realization that I find the metaphor of the “stochastic parrot” 19 quite problematic. While it correctly captures the element of chance and the lack of conscious communication, it in my view inadequately downplays the creativity of today’s LLMs - which, as more recent studies demonstrate, can complete many tasks that are generally viewed as being indicative of creative processes far better than most human beings. 20 Moreover, even though the text emerges on the basis of statistical patterns extracted from the training data, it is not the case that LLMs simply ‘parrot’ specific source texts, as the metaphor might imply for some who are not familiar with how LLMs function. 21 The challenge that human authors of texts face in light of recent LLMs is real - and we’d better face it if we want our discipline to remain relevant. As I go through this paper for final revisions before publication, in March 2025, I must add one more comment: While the recent release of GPT-4.5 is certainly underwhelming, compared to what the 4o-model could already do, the introduction of reasoning models (especially o1 and o3 from OpenAI and DeepSeek R1) has added a new dynamic. As it turns out, it is surprisingly easy to improve the quality of outputs of LLMs if one keeps them from producing an answer directly but rather forces them to first simulate a kind of internal monologue, a chain of thought, in the background. What is most striking is that through rather simple learning mechanisms, these models can develop patterns of self-evaluation that then dramatically increase the quality of the DOI 10.24053/ VvAa-2023-0014 Between Ancient Texts and Large Language Models 37 22 Muennighoff et al., Simple Test-Time Scaling, showed that a large language model (Qwen2.5-32B-Instruct) can be turned into a strong reasoning model—s1-32B—by finetuning it for just 26 minutes on 1,000 carefully chosen examples. Despite this tiny training effort (costing only ~$50), the model outperformed OpenAI’s o1-preview on several math benchmarks. They also introduced a simple trick called “budget forcing”: by making the model generate the word ‘Wait’ before answering, it was encouraged to ‘think’ longer and correct itself—leading to even better results. 23 Markschies, Intelligenz. final response across the board, for all kinds of questions. 22 While these models do not ‘reason’ like human beings, it is still impressive to see how well these mechanisms work for increasing coherence and for inhibiting hallucinations. 4 Challenges for Theology in Higher Education 4.1 Teaching and Research Having ‘played’ our way to an understanding of what LLMs are and are not, and how precisely we normally encounter them when integrated into chatbots, we can now tackle the question of what this implies for teaching theology - especially the Bible - in a higher education context. Many of the issues that arise here apply to both students and researchers. If, for example, LLMs are capable of producing text that could be mistaken for an essay by a student, this also calls into question whether on a societal level there is a need for researchers whose primary purpose apparently consists in writing books - producing lengthy texts - on questions that, so it seems, LLMs are capable of answering as well. Christoph Markschies has made the point that ChatGPT lacks precision for its answers to be competitive with the assessment of human scholars. 23 I remain skeptical. The issues that he mentions are, on the one hand, nothing that seems insurmountable with finetuning and, on the other hand, for many ‘ordinary’ people probably seem like hair-splitting - which then raises the question of whether the taxpayers’ money is spent well on funding such research at all. Against that backdrop, I would advise not to approach the issue of the impact of LLMs on teaching in the humanities in an isolated fashion, as if we were dealing only with a very specialized and isolated issue in relation to students only. An example would be the question of whether we should focus on inperson exams and oral examinations instead of written homework. Rather, we must recognize that LLMs have indeed begun to infringe into the realm of core competencies of academic readers of the Bible - on whatever educational level. In what follows, I want to comment briefly on three such areas where LLMs pose a challenge to students and researchers/ teachers alike and where in my view DOI 10.24053/ VvAa-2023-0014 38 Christoph Heilig 24 Lauster, Licht. 25 Evangelisch-Theologischer Fakultätentag, Beschlüsse 2023. 26 Cf. Streett, Professors. critical examination of our self-understanding as theologians/ Biblical scholars is in order. 4.2 Ancient Languages While the last two issues that I want to explore (reading and writing) are relevant for basically all disciplines within the humanities, I want to begin with an emphasis that is unique to Biblical studies and some other disciplines dealing with pre-modern texts, namely the acquisition of rather elaborate expertise in ancient languages. In the German-speaking university system, this means that students who want to become pastors still need to study Hebrew, Greek, and Latin. While there have been some voices that challenged this philological emphasis, 24 the Protestant faculties just recently confirmed their commitment to keeping all three languages as part of their curriculum. 25 For students, this means a significant investment of time and effort into acquiring competencies that with the rise of LLMs might seem increasingly irrelevant to many. After all, what is the point in learning Greek stem forms if there are now tools that can answer most questions about ancient Greek texts much more reliably than most graduates of theology will ever be able to do? We’d better have a good answer for that. Especially since most professionals in the field arguably never acquire the vocabulary of a 4-year-old native speaker. 26 From my own anecdotal experience, I would postulate that many students of theology still actually want to learn these languages. But any conversation about the future of the didactics of ancient languages in the context of Biblical studies must take the point of departure from a dialogue with these students. We need to learn what their goals behind such a desire actually are - and whether these goals align with why we as teachers deem these language competencies important for them. Is their goal to be able to simply “read the Bible in its original languages”? Then there can be little doubt that the way Greek is taught in most contexts of higher education right now is not conducive to this purpose. Teaching ancient languages as living languages is demonstrably much more effective in this regard, especially if the alternative is the idea of teaching Classical Greek to readers of Koine Greek texts. By contrast, what if the goal is to enable students to become competent analysts of language in general? Then, I believe, one would DOI 10.24053/ VvAa-2023-0014 Between Ancient Texts and Large Language Models 39 27 Brown et al., Language Models. 28 If this seems incredible to you because you had very long conversations with ChatGPT at that time and still felt ‘understood’, this only underlines how little context actually is have to ask whether more general linguistic competencies would not need to become much more central. I suspect that it is in the dialogue about this issue that we will discover a significant discrepancy between what students are looking for in learning ancient languages and the motives of their teachers for insisting on the relevance of this part of their education. This is at least what I would expect given that many of these teachers understand themselves as Biblical scholars - and, as such, need the ancient languages for purposes that are actually quite different from the ways in which their graduates, especially if they end up as pastors, will make use of this part of their knowledge in their daily lives. It requires a different kind of proficiency in a language to be able to read through large portions of primary texts and detect very specific syntactical phenomena compared to preparing a sermon and being able to understand the arguments made about these texts in the secondary literature. I, thus, sense that with respect to the proper place of ancient languages in Biblical studies, it will be most difficult to reach a consensus that satisfies both students and teachers - because the matter here is not just with what LLMs are capable of but also with normative perspectives on what ‘a theologian’ (or ‘professional reader of the Bible’) should be able to do themselves. 4.3 Reading There are two other areas, however, in which I would suspect much more overlap in actual practices, with the result that students and their teachers (in their role as researchers) actually face very similar problems - namely with respect to reading and writing, arguably two core competencies that every student and scholar of the Bible must develop as part of their studies/ research. As of now, Biblical studies - as well as theology and the humanities in general - are inconceivable without rather extensive secondary texts on the subject of study, written in modern research languages. Students have to read textbooks, commentaries, monographs, and articles about the Biblical texts that they are supposed to learn more about. When ChatGPT was made available to the public, the original context window for any interaction consisted of only 4096 tokens. 27 That means that for every response, ChatGPT could only take into account ca. 2000-3000 words of the prior conversation. 28 In summer 2023, OpenAI released GPT-4 with 32k DOI 10.24053/ VvAa-2023-0014 40 Christoph Heilig required for conversation to succeed. That in and of itself comes with some challenges for specific research emphases in Biblical studies - for example tradition history. 29 Anthropic, Claude 2. 30 Anthropic, Claude 3. 31 Heilig, Apostle. 32 Google’s Gemini 1.5 Pro will push this into the realm of 1 million tokens. But this version is not yet widely available. Gemini Team, Gemini. tokens to some paying customers and their competitor Anthropic made their LLM Claude 2 widely available, which could even take into account up to 100k tokens. 29 For the first time, this meant that whole articles and even small books could be processed at once by an LLM. Before that point in time, the only way to ask an LLM questions about a larger document was to make use of techniques such as creating embeddings and vectorization of text data. This meant that users could input a query and the LLM would generate an embedding for it, compare it with the embeddings from the larger text, and provide an answer based on the most relevant section. While this advancement significantly improved the ability of LLMs to provide detailed and contextually accurate responses, it was not suitable for questions that required the text as a whole to be analyzed. As I am writing, in March 2024, we now have an LLM at our disposal - Claude 3 from Anthropic - that has a context window of 200k tokens. Importantly, its ability to recall any section from the input text and to find small details in it (so-called ‘needle in the haystack’-tests) is impressive. 30 For example, when I gave the Opus version of Claude 3 the entire text of my recent book about Paul’s critical interaction with the Roman Empire, 31 added a single sentence that claimed that the apostle actually ‘admired’ the Roman emperor Claudius, and then asked the LLM about a potential statement that did not fit the general argument about Paul’s view of Claudius, the LLM was immediately capable of identifying the addition and quoting it verbatim. As context windows are extended into the realm of millions of tokens 32 and given the impressive recall capabilities that allow very precise answers to complex questions about the text, the question naturally arises of how this will impact reading practices - across the board, but also in Biblical studies. One thing is for sure: It will no longer be possible to force students to read through required reading for modules, having them write reading reports or answer questions on the text in writing. While this development, in my opinion, is actually a good thing (because there has been limited value in students quickly reading through text in order to answer some random questions anyway), it underlines my suspicion that over the coming years we will have to look at our DOI 10.24053/ VvAa-2023-0014 Between Ancient Texts and Large Language Models 41 33 Baldi and Mejia, Slow Reading. curricula of theology with a lot of scrutiny. For if it is our wish to continue to attract students, we must make sure that our curriculum leaves students enough freedom for actually reading texts without feeling time pressure. Otherwise, they will just leave this unpleasant task to an LLM - or avoid this course of study altogether. And that would be a shame - in particular because in times in which we have gotten used to scrolling through social media and superficially skimming texts, the practice of ‘slow reading’ might actually be a pull factor for studying theology, as a decelerated counterpoint to our hectic digital lives. And there is good evidence that slow reading is very advantageous for learning. 33 Offering a context in which prospective students can read foundational texts on funda‐ mental questions of humanity, thus, can be made in a very appealing way - if we ensure that the structural conditions for such an experience are met. And ensuring that might come with some uncomfortable conversations, especially in the field of theology, where century-old divisions of labor into different specializations make every credit point a potential point of contention. But perhaps the optimism that I mentioned can encourage us to nevertheless enter into this difficult dialogue. We might be motivated to come to a construc‐ tive result by the prospect that classical reading and new opportunities opened up by LLMs might very well supplement each other - in a way that might make studying theology more and not less attractive. Why not, for example, have the students chat with the book they just read? This could help students get a more precise understanding of what the author meant, with the LLM being capable of highlighting elements that the student might have missed. Additionally, with the help of LLMs it becomes possible to inquire about mere implications of certain arguments. And who truly thinks that it would not be intellectually stimulating for a student to debate with a chatbot based on the apostle Paul’s writing what ‘he’ would think about certain contemporary political developments, perhaps even confronting ‘him’ with the current secondary literature on the subject? At least in the current situation, where we still lack highly-finetuned LLMs that would give more precise answers on matters related to Biblical studies, having an LLM write a report on a book that the students read - instead of having the students do this themselves - but then encouraging dialogue about that assessment and creating opportunities for the students to push back against aspects of that review, seems to be a very natural and promising opportunity for using LLMs as both a replacement for reading and, at the same time, a stimulus to get students themselves to delve deeper into the text in question. DOI 10.24053/ VvAa-2023-0014 42 Christoph Heilig 34 Claude 3 and Heilig, Exegesis. 35 Wilder et al, Forschungsintegrität. 36 Heilig, Erzähler, 1014-1015, and Heilig, Apostle, 129-134. 37 Heilig, Triumph. 38 Heilig, Apostle, 127-129. 4.4 Writing For students (and researchers), reading often leads to writing. The more the context windows of LLMs grow, the more this of course makes it attractive to outsource the writing to these systems too. Gone are the days when a student essay could easily be detected as an AI-generated text because it lacked the kind of detail that would require the consultation of specialized literature. Sure, we are not yet at the point where autonomous AI agents search for the relevant secondary literature themselves. (Update: At the time of revision, in March 2025, huge progress has been made in that regard.) However, with large context windows and digital files of the research literature at their disposal, it is at least possible for users to create highly technical commentaries on Biblical texts without any knowledge of the original languages, simply by providing the LLM with sections of the commentaries that are supposed to be consulted. 34 That we might witness a trend toward the automation of literature reviews, something that began several years ago in the natural sciences, 35 is probably not something that will worry Biblical scholars too much. However, I also think the just-mentioned capabilities of present-day LLMs call into question at least one entire genre of secondary literature in our field, namely the Biblical commentary - at least in its current form, which has many problems, as evidenced by the many cases of plagiarism in recent years. 36 And ultimately, we must reckon with the possibility that even highly speci‐ alized monographs might not be safe forever. We must self-critically raise the question of how complex the research processes behind these projects actually are. For example, I wrote an entire book on the question of what the Greek verb θριαμβεύω in 2 Cor 2: 14 means and what Paul wants to achieve by using this metaphor of the Roman triumphal procession. 37 This research involved the analysis of all occurrences of this verb and related lexemes in the relevant time frame in ancient literature. As I described elsewhere, progress in the area of the ‘digital humanities’ has already made this research a lot less time-consuming than attempts by other scholars to answer the same question, only a couple of years earlier. 38 Still, the research required a lot of effort - but the research strategy itself that I employed was not at all complex, could indeed be devised by a current LLM … and could arguably be implemented by an autonomous DOI 10.24053/ VvAa-2023-0014 Between Ancient Texts and Large Language Models 43 39 As I revise this piece for publication in March 2025, OpenAI’s “Deep Research” (which runs on the reasoning model o3) has given us a glimpse at how this might look like in the future. We are not quite there yet, but the idea of an AI research agent that writes whole PhD theses certainly no longer seems that implausible. 40 This will still come at a certain cost because these tasks require a lot of energy. For a current estimation of just how much energy is required for such tasks, see now the preprint by Luccioni et al., Processing. AI agent in the not-so-distant future. 39 This development forces us as Biblical scholars to re-examine our self-understanding: What importance can we expect our fellow citizens to assign to our role in society if we conceive of our function as consisting primarily in the ‘production of books’, when works seemingly comparable to our own achievements might soon become available with just a few mouse clicks and minutes of an AI agent’s processing time? 40 Against this backdrop, many of the current discussions about how exactly we can still get students to write on their own seem quite narrow-minded to me. It seems to me that we, rather, need to pose the more fundamental question of to what extent writing should, in fact, still be considered a key qualification for emerging theologians. Over the coming months and years, LLMs will be integrated into commercial office products, and it will soon be as natural to use them to revise entire sections of written work, comparable to autocorrection, which has become normal to us already. Can we, in this emerging context, still expect students to be enthusiastic about the prospect of becoming ‘good writers’? I think a nuanced answer is required here. On the one hand, it seems probable to me that insisting on writing as an end in itself would indeed further contribute to the widespread reputation of theology as an outdated discipline. And I do think that we should not shy away from uncomfortable questions about the true value of written assignments that we are used to giving to students. If the purpose of a certain writing process is only that a certain text with a specific semantic and pragmatic profile emerges, why should students not delegate this task to an LLM? It might subvert our intentions as teachers to use these products to grade the students, but it can hardly be denied that the tool seems adequate for the goal in question. If one still wishes to use such texts for grading, I think the most reasonable strategy is not to forbid the use of LLMs - but rather to allow their use but to increase the expectations concerning the quality of the submitted result, both with respect to the contents and the form of these texts. Ultimately, I remain mildly optimistic that while it might indeed be easy to fulfill minimum DOI 10.24053/ VvAa-2023-0014 44 Christoph Heilig 41 Cf. Finnern and Rüggemeier, Methoden, 294, on the ‘seminar paper’ as the result of research and an act of teaching (i.e., showing the teacher what one has learnt). 42 Update from March 2025: This is, of course, only possible if the task is so complex or there are sub-tasks that need to be justified by the student orally that the final output cannot be accomplished by a reasoning model based on a single prompt. 43 Update from October 2025: Going through the proofs before publication, I must of course acknowledge the Nature Reviews Bioengineering editorial Writing is thinking, which has attracted a lot of attention. While it rightly emphasizes writing’s cognitive value, it conflates writing with thinking rather than recognizing writing as one tool among others for reflective thought. requirements by employing LLMs in a rather mindless fashion, the production of truly outstanding texts will still require significant human input in the future. Note that even if a student tries to automate a task such as the production of a commentary on a Biblical passage (a typical task for an exegetical ‘Proseminar’ in the German-speaking system), they will still have to carefully read through each output of the LLM and compare the result to what they already had in mind. After all, the non-deterministic nature of LLM-based chatbots such as ChatGPT (see above) means that at each turn a variety of textual decisions are possible that can be evaluated. If a student truly immerses themself in this process, the resulting exegetical paper will of course not document their ability to carry through several ‘methodological steps’ 41 - but it will be the result of a process during which the student will have been heavily invested in the structure and meaning of the text and the different discourses surrounding its various dimensions. 42 And if this comes close to what we ultimately want our students to achieve during their studies, we should be happy to accept such texts for assignments. On the other hand, there is also a chance embedded in this challenge. For if we decide to do away with writing tasks that are result-oriented because LLMs have made them obsolete as a means of testing student effort, this might at first sight decrease the role that writing plays for the study of theology - but might ultimately actually allow for writing to come into view in a more limited, but very relevant way, namely as a tool for critical thinking. Especially in an age when more and more writing tasks will be automated (e. g., daily communication by email), the opportunity (! ) to think about a Biblical text by writing about it, might actually be very enticing to prospective students. However, it is crucial that we make a positive case for writing - and hence our discipline. So far, I would wish to see more confidence in that respect. 43 It would be well justified. After all, there is ample evidence for the importance of writing DOI 10.24053/ VvAa-2023-0014 Between Ancient Texts and Large Language Models 45 44 On this relationship between writing and reasoning, see already Applebee, Writing. 45 Van der Weel and Van der Meer, Handwriting. 46 https: / / chat.openai.com/ gpts. for reasoning in general 44 - and even for wide-reaching cognitive benefits of handwriting in particular. 45 Moreover, there is a chance of finding an appropriate - and prominent - place for writing (and also for speaking) not just as part of the process of thinking about a subject but also as part of a larger process of communication. While concerns about LLMs replacing human dialogue partners should not be swept aside, there also seems to be potential in the rise of LLMs as go-to resources for questions that students might have. While they may not be suitable for providing quotable definitions of theology (see above), they might still be adequate conversation partners for discussing what theology is precisely! Two dynamics strongly indicate that this is a development that will become very important in the future. First, the fact that in the app version of ChatGPT it is possible to actually talk, almost in real time, to the LLM, in my view leads to an entirely new quality of the experience of the interaction with LLMs. A student can now lie down in their bed, close their eyes and have a relaxed conversation about a Biblical book - in the speed and style that they need. Second, even without finetuning an LLM (which would mean using highly specialized training data to adjust the probability values of the tokens), it is now possible to ‘customize’ an LLM, namely GPT-4. 46 While overhyped in some respects, the pedagogical innovation potential of these customized GPTs is indeed enormous. After all, this service from OpenAI allows students not only to determine how the GPT is supposed to interact with them, but also to upload PDFs as ‘background knowledge’ that the LLM can then draw upon. To be sure, these developments might merge in a situation in the future where in-person teaching, and institutions of higher education in general, might come under scrutiny - for better or worse. There can be no doubt, however, that even at the present this already constitutes a supplement to more conventional university teaching and that the value of such a customized dialogue partner for many students who would otherwise not have such competent conversation partners could be immense. These considerations also point toward the answer to the question of what kinds of skills will be most important in an ‘AI-dominated’ future. Should we all complete courses on ‘how to prompt’ in the most efficient manner? I think nothing could be further from the truth. Even the customized GPTs, which only require natural language to be set up, already point in a totally different DOI 10.24053/ VvAa-2023-0014 46 Christoph Heilig direction. Those who want to make use of GPT as an extension of their own creative capabilities need to become one thing and one thing only - good communicators, people who can, first of all, independently think through issues and are, then, capable of composing texts - written and spoken ones - that adequately express their intentions. Bibliography Anthropic: Claude 2, https: / / www.anthropic.com/ news/ claude-2. Last access: March 27, 2024. Anthropic: The Claude 3 Model Family. Opus, Sonnet, Haiku, https: / / www-cdn.anthropic .com/ de8ba9b01c9ab7cbabf5c33b80b7bbc618857627/ Model_Card_Claude_3.pdf. Last access: March 27, 2024. Applebee, Arthur N.: Writing and Reasoning, Review of Educational Research 54 (1984), 577-596. Baldi, Brian/ Mejia, Cynthia: Utilizing Slow Reading Techniques to Promote Deep Learn‐ ing, International Journal for the Scholarship of Teaching and Learning 17 (2023), 13. Bender, Emily M. et al.: On the Dangers of Stochastic Parrots. Can Language Models Be Too Big? 🦜 , Proceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency (2021), 610-623. Brown, Tom B. et al.: Language Models are Few-Shot Learners. Advances in Neural Information Processing Systems 33 (2020), 1877-1901. Bubeck, Sébastien et al. (Microsoft Research): Sparks of Artificial General Intelligence. Early experiments with GPT-4, arXiv, https: / / doi.org/ 10.48550/ arXiv.2303.12712. Last access: March 27, 2024. Claude 3/ Christoph Heilig: An AI-Generated Exegesis of 2 Cor 2: 14, https: / / www.acade mia.edu/ 116460959/ An_AI_Generated_Exegesis_of_2_Cor_2_14. Last access: March 27, 2024. Evangelisch-Theologischer Fakultätentag: Beschlüsse 2023, http: / / www.evtheol.fakulta etentag.de/ PDF/ ETFT_2023%20Beschlusse02.pdf. Last access: March 27, 2024. Finnern, Sönke/ Rüggemeier, Jan: Methoden der neutestamentlichen Exegese. Ein Lehr- und Arbeitsbuch (UTB 4212), Tübingen 2016. Gemini Team, Gemini 1.5: Unlocking Multimodal Understanding across Millions of To‐ kens of Context, arXiv, https: / / doi.org/ 10.48550/ arXiv.2403.05530. Last access: March 27, 2024. Grace, Katja et al.: Thousands of AI Authors on the Future of AI, arXiv, https: / / doi.org/ 10.48550/ arXiv.2401.02843. Last access: March 27, 2024. DOI 10.24053/ VvAa-2023-0014 Between Ancient Texts and Large Language Models 47 Heilig, Christoph/ GPT-4: Digitale Reformation. Wie KI die Theologie transformiert, feinschwarz, https: / / www.feinschwarz.net/ digitale-reformation-wie-ki-die-theologie -transformiert/ . Last access: March 27, 2024. Heilig, Christoph: Paul’s Triumph. Reassessing 2 Corinthians 2: 14 in Its Literary and Historical Context (BTS 27), Leuven 2017. Heilig, Christoph: Paulus als Erzähler? Eine narratologische Perspektive auf die Paulus‐ briefe (BZNW 237), Berlin 2020. Heilig, Christoph: The Apostle and the Empire. Paul’s Implicit and Explicit Criticism of Rome, with a foreword by John M. G. Barclay, Grand Rapids 2022. Heilig, Theresa/ Heilig, Christoph: Historical Methodology, in: Heilig, Christoph et al. (eds.): God and the Faithfulness of Paul. A Critical Examination of the Pauline Theology of N. T. Wright (WUNT II 413), Tübingen 2016, 115-150. Hubert, Kent F. et al.: The Current State of Artificial Intelligence Generative Language Models Is More Creative Than Humans on Divergent Thinking Tasks, Scientific Reports 14 (2024), article number 3440,-https: / / doi.org/ 10.1038/ s41598-024-53303-w. Lauster, Jörg: Mehr Licht. Überholte Strukturen, Süddeutsche Zeitung, https: / / www.sue ddeutsche.de/ kultur/ theologen-ausbildung-reform-1.5687098. Last access: March 27, 2024. Luccioni, Alexandra Sasha et al.: Power Hungry Processing. Watts Driving the Cost of AI Deployment? , arXiv, https: / / arxiv.org/ abs/ 2311.16863. Last access: March 27, 2024. Markschies, Christoph: Wenn die künstliche Intelligenz halluziniert. Ein Selbstversuch mit einem viel diskutierten Chat-Bot, zeitzeichen, https: / / zeitzeichen.net/ node/ 10326. Last access: March 27, 2024. MLA Style Center: How Do I Cite Generative AI in MLA Style? , https: / / style.mla.org/ ci ting-generative-ai/ . Last access: March 27, 2024. Muennighoff, Niklas et al.: s1: Simple test-time scaling, arXiv, https: / / arxiv.org/ abs/ 2501 .19393. Last access: March 25, 2025. OpenAI: GPT-4 Technical Report, arXiv, https: / / doi.org/ 10.48550/ arXiv.2303.087741. Last access: March 27, 2024. Schneider, Jan Georg: Intelligible Texturen. Welche Rolle kann ChatGPT bei der Auf‐ satzbewertung spielen? , VK: KIWA, https: / / zenodo.org/ records/ 10877034. Last access: March 25, 2025. Streett, Daniel R.: Greek Professors. Do They Know Greek? , https: / / danielstreett.com/ 201 1/ 09/ 12/ greek-professors-do-they-know-greek-basics-of-greek-pedagogy-pt-3/ . Last access: March 27, 2024. Universität Basel (Vizerektorat Lehre): Leitfaden „Aus KI zitieren“. Umgang mit auf Küns‐ tlicher Intelligenz basierenden Tools, https: / / www.unibas.ch/ dam/ jcr: 4946902a-49d7 -4539-8968-2e81879d6b96/ Leitfaden-KI-zitieren_Apr-2023.pdf. Last access: March 27, 2024. DOI 10.24053/ VvAa-2023-0014 48 Christoph Heilig Van der Weel, F. R./ Van der Meer, Audrey L. H.: Handwriting But Not Typewriting Leads to Widespread Brain Connectivity. A High-Density EEG Study With Implications for the Classroom, Frontiers in Psychology 14 (2023), 1-9. Wilder, Nicolaus et al.: Forschungsintegrität und Künstliche Intelligenz mit Fokus auf den wissenschaftlichen Schreibprozess. Traditionelle Werte auf dem Prüfstand für eine neue Ära, in: Miller, Katharina et al. (eds.): Verlässliche Wissenschaft. Bedingungen Analyse Reflexion, Darmstadt 2022, 203-223. Writing is thinking, Nature Reviews Bioengineering 3 (2025), 431. DOI 10.24053/ VvAa-2023-0014 Between Ancient Texts and Large Language Models 49