Open Book Publishers logo Open Access logo
  • button
  • button
  • button
GO TO...
book cover

5. Barely Beyond the Book?

Joris van Zundert

© Joris van Zundert, CC BY 4.0

‘There is nothing deterministic about the Internet’1

Of methodological interaction and paradigmatic regression

This is a story about the methodological interaction between two scientific fields, that of textual scholarship and that of computer science. The names of the fields, however, only imprecisely delineate the permeable boundaries between research domains where methodologies interact—for obviously the world is much more fluid than such nouns suggest.2 The interactions of interests are much more complex than the simplified image of a dynamic whereby one field donates a methodology to another. Rather than trying to reflect on the current state and the future potential of the digital scholarly edition from well inside the field of textual scholarship, let us approach the topic from the perspective of the multidisciplinary methodological interaction that has arisen to support the theoretical and practical development of the digital scholarly edition over the recent years. Textual scholarship in its digital fashion belongs to the broader field of Digital Humanities, itself a field built on interdisciplinarity, where many skills and theories of the realms of computer technology and those of scholarship intersect, and thus where many new interfaces and interactions arise between those skills and the fields they are tied to.3 This is where Digital Humanities acquires its innovative power, or at least the promise of that power.

That innovative power, however, can be both exciting and confusing. The point where disciplines intersect is not a space for the calm, cool and collected exchange of technical and methodological knowledge. Rather, it is a place where the inherent social aspects of science and research are brought markedly into the foreground.4 Take for example Jan Christoph Meister’s description of the ‘lamented conflict between “computationalists” and “humanists”’. This conflict, Meister states,

arises as soon as we become afraid of our own courage and shy away from jumping across these two fault lines. Let’s cut through that fear. The task remains […] to ‘become capable of both—the metaphor and the formula, the verse and the calculus […]’. That’s a borderline experience, no doubt, and those who prefer to pitch their tent in the comfortable centre of either laager don’t run the risk of questioning their own philosophical, epistemological and ethical identity as easily.5

Meister’s word use is notably emotive (‘afraid’, ‘fear’, ‘courage’) and at the same time vividly touches on the impact of the social dimension (‘conflict’, ‘borderline experience’, ‘risk’) of the epistemological interaction that is expressed. As Christine Borgman has suggested this is a situation where it can be useful, with respect to the design of scholarly infrastructure, to take these interactions and the behaviour connected to them as the objects of study.6 Let us do exactly that here. Taking the digital scholarly edition as a part of the scholarly infrastructure for textual scholarship, we can try to infer what the historical interactions between textual scholarship and computer science tell us about the current state and development of the digital scholarly edition.

The field of science and technology studies (STS) offers a useful frame for critical study and reflection on what occurs at the interfaces of the various research fields within digital scholarship. When these fields intersect, it is not simply a question of objective interactions concerning technology and methodology; rather, these interfaces are also the site of social processes that guide and steer the methodological interaction. Within STS such processes are often referred to as the social shaping of technology—that is, the mutual interplay between technology, its developers or champions and the users of that technology. It is this interplay that changes properties and application of the technology at hand. For example, such interplay is very prominent in software development, in which development iterations and lifecycles are a clear expression of the interaction between builders and users as they shape software until the users’ requirements are satisfied.7

I have previously argued that social shaping of technology can lead to ‘paradigmatic regression’.8 These are acts of shaping that translate an expression of the paradigm of the new technology into an expression of a paradigm that is already known to the user. Resistance to new technologies, where the use or sophistication of the new technology is denied, can of course be a motivator of paradigmatic regression.9 Not all regressions are necessarily motivated by conservatism or resistance, however. But even when users do embrace a new technology, the act of its social shaping may create a paradigmatic regression effect. An example of this effect can often be found when a metaphor is used in a graphical user interface (GUI). GUI metaphors are used to convey the processes or data underlying a particular piece of software in a manner that is meaningful or intelligible for human users. In order to help the user understand a new target domain or a new paradigm, it is expressed by way of a conceptual domain or a paradigm that is already known to the user. An obvious example is the metaphor of the desktop, which was used to communicate the functions of a PC to as broad an audience as possible.10 The only trouble is that such metaphors are necessarily incomplete as they conceal both the good and the bad of the deeper computational model. Inconsistencies in the model are hidden by a metaphor that suggests completeness to the user. Equally, metaphors hide useful functions and possibilities of the model that are not covered by the metaphor’s originating paradigm.11 In our example, the desktop metaphor does nothing to reveal the power of automation that a PC delivers to its user. GUI metaphors are probably best viewed as the expression of the assumptions that software developers hold about the user’s interaction with the underlying model—but not, in any case, as a transparent and effective way of allowing the user to engage with the computer’s raw power. Metaphors are in this respect paradoxical: what is meant to be a transparent means of interaction with new possibilities of a computational model is in fact an opaque barrier confining the user to a well-rehearsed collection of concepts and processes.

What happens at the intersection?

Paradigmatic regression is not only to be found in graphical user interfaces, we can observe similar dynamics at the level of methodological interaction between or even within research domains. To understand how paradigmatic regression can also occur as a result of the interaction between computer science and textual scholarship, it is useful to view this interaction through the lens of an existing analytical metaphor for such interaction: the trading zone.

The processes at the intersection of research domains (such as textual scholarship and computer science) have been compared to those in trading zones.12 Whether they are zones of economic activity or those where methodologies of different fields are amalgamated, pidgins commonly arise in such places. As Peter Galison says: ‘A reduced common language, which begins with participants in a zone agreeing on shared meanings for certain terms, then progresses to a kind of pidgin and eventually to a creole, which is a new language born out of old ones’.13 Galison also draws attention to the possible existence of visual and mathematical creoles. Indeed, these are not hard to identify in Digital Humanities: a good example can be seen in the works of Franco Moretti, who has methodologically integrated quantification and visualisation methods such as graphs, maps and tree heuristics into comparative literature studies.14 Nor is it very hard to identify current Digital Humanities as a whole with a new expert community as, according to Galison, they may take shape during the ‘creole stage’ at the intersection of domains. It has been argued that, in creoles of natural language, it is the subordinate group that provides most of the syntactic structure for the creole, whereas the dominant group provides lexical items and concepts. Though Galison provides some empirical observations, it remains an open question whether the same patterns hold for the emergence of methodological pidgins at the interface of different research domains.

What interests us here is whether we can indeed observe the formation of a methodological creole in the emerging vocabulary of Digital Humanities, and whether hints can be found in that vocabulary of a similar regressive dynamic to that observed on the graphical user interface level. It may be that Matthew Kirschenbaum provides us with some—admittedly still anecdotal—evidence of precisely such a dynamic. In a recent article, Kirschenbaum attempts to trace the origin of the label ‘Digital Humanities’. He identifies a key moment, reported to him by John Unsworth, which seems to have been the tipping point that would propel this label towards its current status of de facto denominator of what then was and still is a non-homogeneous research domain. Unsworth relates the choice of ‘Digital Humanities’ to a discussion surrounding the title of the Blackwell 2004 Companion to Digital Humanities: ‘Ray [Siemens] wanted “A Companion to Humanities Computing”, as that was the term commonly used at that point; the editorial and marketing folks at Blackwell wanted “Companion to Digitized Humanities”. I suggested “Companion to Digital Humanities” to shift the emphasis away from simple digitization’.15 Of course we cannot take this as a pars pro toto for the social shaping of the dynamics for a whole field, but it is suggestive. Ray Siemens by no means stands alone in his preference for ‘Humanities Computing’. Susan Hockey, for instance, titled her contribution to this very companion ‘The History of Humanities Computing’.16 Significantly, it is the prominent authorities in the field, veritable Nestors, who consistently speak of ‘Humanities Computing’—people like Dino Buzzetti: ‘humanities computing—I still prefer this designation to digital humanities’.17 According to Siemens, the term was ‘commonly used at that point’, yet the publishers preferred the new term in order to broaden the appeal of the concept by choosing a metaphor that felt less challenging. This was a small but pivotal event in the history of the field, which simultaneously points to the state of Digital Humanities as a methodological pidgin and to an act of paradigmatic regression. The vocabulary juxtapositions in both terms are constructs of a methodological pidgin. Where ‘Humanities Computing’ suggests an equal interaction or relation between two fields with a stress on computational activity, the term ‘Digital Humanities’ (purposefully or not) pushes the balance back toward the domain of humanities and subjugates the computational/digital aspect as a partial property of that field. Or in the words of Willard McCarty: ‘Note, please, the name “digital humanities” grammatically subordinates the digital […] “Humanities computing” takes advantage of the ability in English to make a noun serve as an adjective while staying a noun, and it draws upon the participle/gerund ambiguity. But it seems I’ve lost this contest!’18

The trading zone and digital textual scholarship practice

Scholarly digital editions and the sites where they are conceived and created, virtual or concrete, are themselves methodological trading zones that materialise at two levels. There is a laboratory-like setting tied in a relatively small context to the practice of preparing and publishing a concrete digital scholarly edition—and possibly also the development of a specific technical infrastructure connected to it. At a more abstract level we find a theoretical discussion that connects to the methodological and epistemological histories of textual scholarship, knowledge representation and digital technology. The critical study of these trading zones along empirical ethnographic lines—another approach often applied in science and technology studies—would have much to tell about the methodological interaction between computer science and textual scholarship. Although such an elaborate study has yet to be undertaken, even fairly anecdotal observations nevertheless yield some intriguing insights.

The Huygens Institute for the History of the Netherlands is home to an example of a smaller-scale trading zone in a laboratory setting.19 The institute encompasses a computer science and software development group that is relatively large by the standards of humanities research, numbering around fourteen professionally trained or educated IT developers. Various members of this group have distinct strengths, such as interface design, data modelling, architecture integration and text analytics. The group works closely with at least three researchers who are themselves closely involved in the national and international Digital Humanities community. Through numerous projects, group members are also in close productive contact with most of the other researchers in the institute and with external researchers active in relevant projects. The projects themselves cover a large part of the spectrum of Digital Humanities undertakings, from data modelling and repository building,20 through digital scholarly editions such as the correspondence of Vincent van Gogh,21 to analytical tool building, of which the text collation engine CollateX22 is an example.23

The research staff of the institute originally had no particular focus on digital or computational activities. In 2005 the institute took the strategic decision to move into the domain of digital scholarly publications as well. The initiative began with the addition of a literary researcher and two developers to the institute. Staff at a related institute, later dissolved, had been developing a ‘collaboratory’ for the curation and analysis of humanities and social science data, which today would be called a Virtual Research Environment (VRE). At the Huygens Institute the part of this environment relevant to the humanities, consisting mainly of a transcription and publication environment for historical texts, was adopted and strongly pushed forward while the social science aspect was eventually abandoned. This eventually became the eLaborate online environment, ‘in which scholars can upload scans, transcribe and annotate text and publish the results as an online text edition’.24 eLaborate is a web-based environment where textual scholars find support for basic tasks in creating and editing a digital scholarly edition. A project in eLaborate is essentially a container for a series of scanned manuscript or print text pages that can be arranged arbitrarily in a tree structure. Fine-grained authorisation allows one to arrange access or restrictions down to page level and thus to arrange for private, collaborative or fully open edition workflows. A text editor is facilitated to aid in creating diplomatic and critical transcriptions which can be layered with annotations to serve the researcher’s or reader’s needs. All data is stored and retrievable as XML. eLaborate facilitates the automated publishing of web-based editions and provides a generalised graphical interface based on ‘fluid’ columns. Vertical areas of the screen can be arbitrarily arranged for visualising the reading text, connected annotations, browsing in the text structure, full text search and so forth. Given some basic training, eLaborate provides an out-of-the-box solution allowing textual scholars with only average computer skills to create basic digital scholarly editions without much need for technical support.

It is relevant to note that the IT team adopted an Agile software development methodology. This type of software development takes a manifest user-centred and evolutionary approach to software manufacturing. Short one- or two-week iterations deliver functioning parts of software that are evaluated by the client/user. This ensures the balancing of the software production with the evolving vision and knowledge of the client. Arguably this methodology feeds into the social shaping aspects of introducing new technologies and methodologies.25

A case study of the methodological dynamics surrounding the development of eLaborate serves to show that the trading zone metaphor is not unproblematic. Do the dynamics and interactions in the context—the work site—where eLaborate was developed point to the emergence of a methodological pidgin? Most certainly the developers and the researcher who headed the project started exchanging terminology. The developers began to refer to concepts such as ‘page’, ‘annotation’, ‘transcription’. The researchers grew accustomed to using words such as ‘user’, ‘interface’, ‘architecture’, as well as the vocabulary that is rather typical for the agile methodology used by the developers: ‘planning game’, ‘iteration’. Whether this constitutes a beginning of a methodological pidgin is debatable. The interactions that led to the exchange of vocabulary could equally be attributed to standard development practice in which there is a particular relationship between client and service provider and in which, certainly within agile methodology, the provider normally tries to understand the client’s work process and concepts in order to model them into software. The objective of the developers in that case is simply to mimic as closely as possible the concepts the client is using. Arguably this could cause a medium shift in which the researcher ends up with a digital environment that is virtually identical to his or her known analogue work process and material. Once the work is done, the client and developer can go their separate ways, without having essentially influenced the methodologies on either side.

A clearer indicator of methodological change may be the actual loss of lexical items. During the eLaborate project it transpired that an index—in the sense of the keyword reference list in the back of a book—is not a very useful instrument to mimic in a digital environment if the texts at hand are automatically indexed and the interface includes a full-text search function that presents its result as a list of keywords in context. In various edition projects where eLaborate was deployed some friction and dissonance could be observed among users (either textual scholars or trained volunteers who transcribed manuscript material) about the lack of an index, but gradually the use of full text search as a replacement for the index became accepted, even appreciated, once the possibilities for wildcard and fuzzy search were understood. This is notwithstanding the fact that a full text search is not the epistemological equivalent of an index. Current full text indexing technology does not, for instance, facilitate named entity resolution in the same way as traditional indices may. Nevertheless, within projects based on eLaborate the concept of ‘index’ is no longer used except for references to the past; the concept of ‘zoekfunctie’ (search function) seems to have all but replaced it. For textual scholarship and scholarly editing I would argue that the loss of the ‘analogue representation’ of an index and even the lexical reference to it does indeed constitute a methodological change.

The same event shows the dynamics of social shaping and regression in a different way. With the indexing technology used in eLaborate—first Lucene and later Solr—it is possible to generate search result lists with text context ranked by ‘relevance’.26 Although the keyword-in-context search results list eventually found unanimous adoption, the concept of ‘relevance’ became a topic of recurring and fractious debate. Lucene applies a combination of Boolean and vector space models to determine the relevance of documents to a user’s query. The Boolean measure selects the documents that correspond to the terms the user wishes to find or ignore. A vector space model is then applied to that selection to rank the relevance of each document to the query. Formally this model determines relevance by applying a cosine measure to the vectorised document vocabulary and query.27 The vocabulary of any text can be expressed as a mathematical vector and the basic trigonometric function of the cosine can be applied to determine the size of the angle between two such vectors. This essentially means that the smaller the angle, the more the vocabularies of two texts are similar. In Lucene this measure is used to determine if requested search terms appear more often in a particular document than on average in the vocabulary of all documents retrieved with a specific query. The more such terms appear in a document, the higher the relevance ranking of that document. It transpired that the textual scholars and other users confronted with this technology were for the most part unimpressed with the relevance ranking, which appeared incomprehensible and alien to them. And although the feature was initially presented in the interface, most edition projects within eLaborate preferred canonical orderings such as sorting by folio number, name of author or text, shelf mark etc. As a result, word-weighted ranking is no longer offered in the editing and publication interfaces of eLaborate, and the researcher in charge of the development confirmed that in the several rounds of open testing that the software underwent, none of the trained users requested the function.28

The virtual disappearance of automatic ranking by relevance as a function in the current version of eLaborate is a case of social shaping of technology, and indeed of paradigmatic regression. Ranking by relevance could arguably be methodologically useful for textual scholars who must peruse a large corpus for occurrences of themes, words and motifs. Even if it is not the default, one would expect the option to be available. Technically there are no barriers to providing the function, as it is the default behaviour of the search engine used. In fact, it took additional development effort—though admittedly not much—to provide canonical ordering. Despite all this, the functionality that is standard from the technical point of view is no longer available—a strong signal that the IT developers and the textual scholars found a barrier to knowledge exchange that they were unable to overcome. In other words, they could not create the required methodological pidgin to communicate or appreciate the possible utility of that function.

What is interesting here is not so much the disappearance of relevance-based ranking. There may be valid scholarly reasons to reject such an ordering principle—albeit that these have not been put forward by the users in this case. Rather, it serves as an example in which the pidgin, the ‘reduced common language’ used during the interaction between developers and researchers, was not sufficient to communicate the methodological potential of a relatively straightforward, seemingly useful and non-intrusive method, and so prevented its theoretical consideration. This example shows how difficult it actually is, both for researchers and for developers, to use the trading zone for methodological gain or innovation. The textual scholars involved first needed to know of the existence of such a thing as ‘ranking by relevance’ to be able to recognise its possible methodological potential. Next, to establish that potential would require them ultimately to drill down to the mathematics of cosine measure for vector comparison and understand how vectors can represent documents. As it has been argued elsewhere in a similar vein, without such a detailed level of knowledge, it is difficult to assess the methodological usefulness of new technologies.29

It should be noted additionally that this is a small example involving relatively standard digital technology. The syntactical and lexical distance that must be bridged in the case of a project such as Circulation of Knowledge and Learned Practices in the 17th-century Dutch Republic is significantly larger,30 as in that project correspondences are visualised through network analysis.31 A sensible understanding of what may be inferred from network visualisations and what this adds in terms of methodology requires a fairly deep grasp of the mathematical models underpinning not only network modelling and analysis in general, but also the topic modelling used to generate the network data.32

All in all, this raises the question of how much methodological interaction is actualised in a methodological trading zone in a smaller concrete context as some superficial vocabulary is certainly exchanged, of which some may be instrumental in future co-operation for both researchers and developers. But there is little in the way of deep methodological trading going on. Textual scholars are not providing knowledge about theoretical notions on scholarly editing and literary criticism to developers; and, vice versa, developers are not lecturing researchers about mathematical or computational principles. The common language does no more than create an interface that answers to the perceived needs of researchers in the humanities. The interface becomes an expression of these researchers’ conceptions of how the digital technology might serve their purpose.

The methodological gain in this is rather superficial: access and discovery increase in scope, but concepts and processes hardly change. There is a digital translation, but little methodological innovation. The potential or realised methodological innovation furthermore happens rather covertly. In the case of the relevance ordering in eLaborate the potential is there, but hidden—again!—by a graphical interface, and by an apparently suboptimal methodological exchange between researchers and developers. In the case of the Circulation of Knowledge project, the mechanics, technology and methodology are almost completely covertly integrated into the resulting digital environment by the computer scientists. A further consequence was that the main technical developer struggled with negative feelings about lack of recognition for methodological merit. The covertness of this methodological innovation is far from trivial. If, as Peter Shillingsburg has pointed out, editions are scholarly and critical arguments about what a textual record means or about how it should be read, then a digital edition is also such an argument.33 Because both interface and model are constituents of the digital edition, they are both part of that intellectual argument. The model—i.e. the combination of the data model and the computer language logic that puts it into action—is entirely conceived by computer science experts. The interface and the view it offers on that model, including the functions of the model it exposes to or hides from the outside world, is to a very large degree conceived by developers and designers. The methodology used for this is effectively inaccessible to the textual scholars, who lack the skills to interpret and comprehend the technologies used. Given that the computer scientists create so much of the intellectual argument pertaining to a particular digital scholarly edition, it would seem that having a sufficiently broad common methodological language is pivotal to digital textual scholarship. But as we can see, our current dynamics of interaction are not helping to create it.

Trading theory in the larger textual scholarly context

Although the trading zone between computer science or digital technology and textual scholarship seems so problematic at the smaller more concrete level, there seems to be no shortage of methodological trading on the theoretical level. Exhaustively detailing and disentangling the intricately intertwined histories of textual scholarship, knowledge representation, literary criticism, computing and digital technologies, is hardly feasible in the span of this chapter. Moreover, creating history often suggests a falsely deterministic account of cause and effect. Nevertheless, it is important to identify a number of key developments. The beginnings of the Internet and the World Wide Web are usually identified with Vannevar Bush’s vision of the Memex, an imaginary system to store, track, index and retrieve any information, and—crucially—to rewrite that information and keep versioning records so as to trace the development of our thoughts.34 Visions of such knowledge systems reach far further back, however, at the very least to the work of Paul Otlet in the early twentieth century, as has been repeatedly shown.35 It was Theodor Nelson who coined the term Hypertext and constructed a theory for it, inter alia referring back to Bush.36 Nelson’s attempts at implementing his visions failed to result in successful tools; instead it was Tim Berners-Lee whose team devised the Hypertext Transfer Protocol, which successfully kick-started the World Wide Web, with reference to the work of Nelson.37 Although sympathetic to his endeavour, Nelson deeply hates Lee’s technical solution:

It is vital to point out that Tim’s view of hypertext (only one-way links, invisible and not allowed to overlap) is entirely different from mine (visible, unbreaking n-way links by any parties, all content legally reweavable by anyone into new documents with paths back to the originals, and transclusions as well as links—as in Vannevar Bush’s original vision).38

Imperfect or not, HTTP technology happens to align nicely with many ideas on the nature of knowledge and text that are emerging in literary criticism, textual theory and semiotics, which increasingly problematise a linear view of text and result in more post-structuralist approaches. George Landow summarises the convergence:

Hypertext, an information technology consisting of individual blocks of text, or lexias, and the electronic links that join them, has much in common with recent literary and critical theory. For example, like much recent work by poststructuralists, such as Roland Barthes and Jacques Derrida, hypertext reconceives conventional, long-held assumptions about authors and readers and the texts they write and read. Electronic linking, which provides one of the defining features of hypertext, also embodies Julia Kristeva’s notions of intertextuality, Mikhail Bakhtin’s emphasis upon multivocality, Michel Foucault’s conceptions of networks of power, and Gilles Deleuze and Felix Guattari’s ideas of rhizomatic, ‘nomad thought’. The very idea of hypertextuality seems to have taken form at approximately the same time that poststructuralism developed, but their points of convergence have a closer relation than that of mere contingency, for both grow out of dissatisfaction with the related phenomena of the printed book and hierarchical thought.39

Digital textual scholarship and more particularly the digital scholarly edition obviously rely on the technologies delivered by the development of the Internet and the hypertext protocol. In turn, these technologies are rooted in theory which sees the nature of knowledge, information and documents as highly interconnected and referential, or intertwingled and transclusional, as Nelson would in all likelihood phrase it. Peter Robinson expresses similar views when he discusses the idea of ‘distributed editions’, with attribution also to Peter Shillingsburg and Paul Eggert.40 Robinson is interested in the volatile aspects of editions. He posits that readers may become writers too, and proposes that editions may exist in a distributed fashion in an interactive web-based space. Each reader may have a different representation: ‘a manuscript transcription from one site, a layer of commentary from one scholar, textual notes and emendations from another, all on different servers around the globe. In a sentence: these will be fluid, co-operative and distributed editions, the work of many, the property of all’.41 According to George P. Landow, this vision is strongly associated with the Docuverse, the ideas on nonlinear writing and hypertext systems described by Nelson:

Perhaps the single most important development in the world of hyper-media has been the steady development of read-write systems—of the kind of systems, in other words, that the pioneering theorists Vannevar Bush and Theodor H. Nelson envisioned. Blogs, wikis […] all represent attempts to bring to the Web the features found in hypertext software of the 1980s that made readers into authors.42

But ideas on more interactive and volatile editions also refer to another complex of theory surrounding the fundamental instability of text. This complex encompasses a post-structuralist view of text where text is not a book but a hypertext, and where hypertext stresses the volatility of text, its heterogeneous, mutable, interactive and open-ended character—ideas rather opposed to that of text as an immutable form enclosed and bound by a front and back cover in a book. This theoretical complex also borrows from ideas on the fluidity of text as expressed for example by John Bryant who calls attention to the perpetual flux texts show trough preprint revisions, revised editions, and adaptations that shape literary works into forms specific to different audiences.43 Similarly, the importance for scholarly editing of the volatile aspects of text is expressed through what has become known as critique génétique, an approach to editing that focuses on the avant-texte, the process of writing and revision that precedes the publication of a book.44

The instability and process aspects of text are also important to textual scholarship and the practice of scholarly editing from the point of view of the use of editions: of what happens after publication. The ideas behind hypertext, together with those about read-write systems, also inform ideas concerning the social aspects of text and scholarly editing. Read-write systems facilitate crowdsourcing and thus open up the process of scholarly editing to a potentially far larger source of labour by ‘expert amateurs’45 than the individual scholar could provide for.46 Crowdsourcing engages an audience of users in the scholarly process literally in the avant-texte phase of the creation of a scholarly edition. This potential need not be confined to, say, the transcription stage of a scholarly project. Meanwhile, ideas have been developed on the so-called social edition, which allows readers/users to add their knowledge to the edition and render its creation and use a community event under the guidance of scholarly experts.47 Lastly, the process aspect of text is also highlighted through new computational engagements that readers/users may make with texts and scholarly editions. This aspect was already expressed as early as 1949 through what is now usually seen as the first application of Humanities Computing: the work of Roberto Busa,48 which led to the computational means necessary to derive automatically a concordance to the works of St Thomas Aquinas.49 This was the beginning of a long development that prefigured current computer-supported analytic engagement with literary texts such as distant reading, algorithmic reading and big data analysis.50

The shape of the digital edition according to reality

In short, the interaction between digital technology and textual scholarship places the focus of methodology on both the unstable and fluid aspects of text, and on the process aspects of texts. That is the fundamental tenet that computer science brings to textual scholarship.

Hypertext, unlike print, is fundamentally process- and context-oriented. Following a basic tenet of artificial intelligence theory, it views representing and acquiring knowledge as a problem of defining and searching information spaces, and it recognizes that these spaces and search methods will vary according to the purposes and abilities of particular users.51

Digital scholarly editions are indeed information spaces. But they are not often information spaces that line up with the theoretical pidgin discussed above. The theoretical notions of textual scholarship, and the scholarly digital edition that we find in the trading zones between textual scholarship and computer science, call for an expression of text and editions through which the information contained in the edition is expressed primarily according to the principles of hypertext. Current reality, however, is very different. In textual scholarship, Internet nodes are mostly placeholders that point via a URL to a digital document or to a digital edition as a whole, as a data silo. The edition of the Van Gogh letters, for instance, sits at the node identified by as a fully integrated and monolithic pile of edited text from letters; the pile includes comments, annotations, translations and so on. The finest granularity presented to the network of the web is at the level of the individual letter (e.g. Even that URL identifies a compound object, that is, a meaningful set of multiple scholarly objects: two facsimiles, a transcribed text, annotations, bound together by an interface that (again following Shillingsburg) represents an editorial argument about what constitutes the digital scholarly edition of this particular letter. According to this argument, there is no need to address the transcription, the facsimile, a particular annotation, in isolation. Most of the digital scholarly editions on the Web are expressed similarly. It is hardly better than a network of nodes in which each node represents a particular edition that is offered as a PDF. This situation renders it impossible to address texts (and thus editions) beyond their graphical interface in ways compatible with a hypertext model.

Digital editions often trumpet the ability to represent text exhaustively, celebrating the fact that there is no need to make decisions on what to leave out.52 Indeed, it is an asset that digital scholarly editions may be capacious almost without limit. In the case of an important and large tradition of a particular work, this potential may allow for the presentation of all witnesses as items in an inventory, or as a digital archive. Arguably this is not just an asset because of exhaustiveness of representation, but foremost because it allows for the expression of the relations between the witnesses, and thus inter alia the genesis and fluidity of texts—in fact the more process-like aspects of texts—for which the hypertext model as described offers technological expressive potential. In the reality of current digital scholarly digital editions, however, this potential seems seldom realised. A graphical interface will usually allow the user to select and view single witnesses, or perhaps to compare the texts of multiple witnesses, especially if the editor has integrated a collation or comparison tool such as Juxta.53 The inventory will probably also allow a list of witnesses to be shown in chronological order. The order of that list will in all likelihood be based on a metadata property ‘date’ or similar in the relational database underlying the digital edition archive. The list itself is a generated GUI visualisation expressing that metadata. The point here is that a list so represented is not a hypertext representation of the chronological ‘linkedness’ of the witnesses, it is a mere list of individuated metadata. This is different from the idea of hypertext that all information is expressed as machine negotiable nodes and links, so that an expressive network of knowledge is created. This means that the chronological order of the witnesses in this case can only be inferred through human cognition from the metadata based list—it is not represented as knowledge in a computationally tractable form intrinsic to the hypertext medium. Much effort may thus be invested in gathering exhaustive representations of individual witnesses, but if the result of that effort only allows user-level navigation of relational metadata represented as a graphical interface, then the digital scholarly edition is not an effective hypertext knowledge space. Such an edition may still be valuable for the sheer wealth of information, but it remains firmly at the level of document representation for human consumption without integrating the relations between witnesses in a computationally networked representation.

Regression and reaffirmation

There is nothing deterministic about technology, and indeed nothing much deterministic about hypertext. As a technology to express a text and to present it in the form of a digital scholarly edition, hypertext has been shaped by the scholarly community into little more than a filing cabinet for self-contained documents. Most digital scholarly editions on the Internet express the particular idea the scholar responsible for the edition has about what a digital edition is or should be; normally, that idea is a re-representation of the book. We find collections of page-based facsimiles and transcriptions presented as self-contained units, wrapped up in and bound by the front matter that is the interface. There is attention for fluid aspects, and for context. The Hyperstack edition of Saint Patrick’s ‘Confessio’,54 for instance, explicitly offers its users the possibility to venture from the ‘centrality of the text […] through the dense net of textual layers and background information in answer to questions that are likely to arise in their minds’.55 The dense net in question is effectively a star network radiating out from the main page into leaves containing pages of metadata, facsimiles of manuscript folia, or transcriptions of entire texts. Despite the impressive density of information, the information itself is not that densely networked. The relations between the texts and the contextualising information is described, but not expressed through the ‘hyper fabric’ of e.g. HTTP links. Even so, the Confessio is rather an exception to the rule—very few of today’s digital editions seem to be particularly concerned with the core ideal of hypertext as an expression of linked information, of process and context.

Most digital scholarly editions, in fact, are all but literal translations of a book into a non-book-oriented medium. Peter Robinson, writing about the distinctions of text-as-work and text-as-document, argues that in the early days of digital editions—roughly until 2005—scholars would privilege the text-as-work perspective, focusing on the potential of digital technology to express and support the properties of text that construct its meaning.56 In recent years, he continues, this trend has been exactly reversed. More recent digital scholarly editions harness the digital medium rather to represent the text-as-document—the faithful rerepresentation of a text according to its expression in the physical documents that carry it. As an example Robinson points to the online edition of Jane Austen’s fiction manuscripts.57 Elena Pierazzo, who was deeply involved with the methodological design of this edition, unsurprisingly offers a rationale for a text-as-document approach to the digital edition.58 Robinson also notes that many collaborative transcription systems are designed to record text-as-document: not one of twenty-one tools listed in a survey by Ben Brumfield offers the possibility of recording text-as-work.59 Indeed it is far easier to point to examples of digital scholarly editions that are in essence metaphors of the book, or in other words: translations of a print text to the digital medium, apparently for no other reason than to fulfil the same role as the print text.

Textual scholarly theory, as has been shown, embraces hypertext as a technology which enables the expression of post-structuralist ideas about information, with a focus on the fluid properties of text. It has often been suggested that the capabilities of digital technologies should become the focus and practice of digital scholarly editing. Despite all this, that ideal is not materialising in the form of concrete digital editions, and for similar reasons to those observed in the smaller context of the eLaborate project. Here, too, we find the dynamics of paradigmatic regression in the professional community surrounding the digital scholarly edition. The methodological potential of information technology is hidden by the incomplete metaphors of a paradigm that is itself reaffirmed by becoming the primary interface to the new technology. Robinson argues that there is a strong continuity of previous contemplation of print editions present in the thinking of those scholars who first conceived of the digital scholarly edition, resulting in a kind of theoretical pidgin that embraces the new technology, but uses it to express digitally a familiar form for the scholarly edition: the printed book.60 The print edition in that digital translation is a metaphor, but one that begins to hide hypertext’s native potential for expressing referential and conceptual links between texts. The graphical interfaces of digital scholarly editions almost all refer strongly to this book metaphor, reaffirming thereby the paradigm from which that metaphor springs. In the end, the use of the technology has shaped it into a tool to recreate that which is already well known. It is also worth noting that the de facto lingua franca of current digital scholarly editions, TEI-XML, is instrumental in this reaffirmation.61 As an encoding language it is geared fully towards describing text-as-document. Although not graphical in nature, TEI is thus an interface that, like graphical interfaces, hides many of the essential networking and process characteristics of hypertext. Instead, TEI-XML, with its text-inward orientation, print-text paradigm and hierarchical structure focus, constantly reaffirms the view of the digital edition as representing a text-as-document.

Beyond the book?

There is nothing deterministic about the Internet. The paradigmatic regression we currently see in the digital textual scholarship community is a clear demonstration of that. This community has devised a methodological pidgin that exploits a new technology to express a well-rehearsed paradigm of scholarly editing. Yet this must not be where the methodological shaping and disciplinary trading stops. The theoretical concepts pertaining to the fluidity of text are clearly important to the textual scholarly community, but they still need to be brought fully into the concrete methodological pidgin that is currently geared towards representing a text-as-document, rather than toward text-as-process. As long as scholarly editors keep producing digital metaphors of the book, this will hardly happen. Both textual theorists and computer science practitioners must intensify the methodological discourse to clarify what existing technology is needed to implement a form of hypertext that truly represents textual fluidity and text relations in a scholarly viable and computational tractable manner—a hypertext language inspired both by computer science and textual scholarship. Without that dialogue we relegate the raison d’être for the digital scholarly edition to that of a mere medium shift, we limit its expressiveness to that of print text, and we fail to explore the computational potential for digital text representation, analysis and interaction.

1 David Lowery, frontman of Camper Van Beethoven and lecturer at Terry College of Business, University of Georgia,

2 Herbert A. Simon, ‘Technology Is not the Problem’, in Speaking Minds: Interviews with Twenty Eminent Cognitive Scientists, ed. by Peter Baumgarter and Sabine Payr (Princeton: Princeton University Press, 1995), pp. 232–48.

3 Susan Hockey, ‘The History of Humanities Computing’, in A Companion to Digital Humanities, ed. by Susan Schreibman, Ray Siemens and John Unsworth (Oxford: Blackwell, 2004), pp. 3–19,

4 Bruno Latour, Science in Action: How to Follow Scientists and Engineers through Society (Cambridge, MA: Harvard University Press, 1988).

5 Jan Christoph Meister, ‘Computationalists and Humanists’, Humanist Discussion Group, 2013,

6 Christine Borgman, ‘The Digital Future Is Now: A Call to Action for the Humanities’, Digital Humanities Quarterly, 3.4 (2009),

7 Cf. e.g. Gwanhoo Lee and Weidong Xia, ‘Toward Agile: An Integrated Analysis of Quantitative and Qualitative Field Data on Software Development Agility’, MIS Quarterly, 34 (2010), 87–114.

8 Joris van Zundert, ‘The Case of the Bold Button: Social Shaping of Technology and the Digital Scholarly Edition’, Digital Scholarship in the Humanities (8 March 2015),

9 Clement Levallois, Stephanie Steinmetz and Paul Wouters, ‘Sloppy Data Floods or Precise Methodologies? Dilemmas in the Transition to Data-Intensive Research in Sociology and Economics’, in Virtual Knowledge: Experimenting in the Humanities and the Social Sciences, ed. by Paul Wouters, Anne Beaulieu, Andrea Scharnhorst and Sally Wyatt (Cambridge, MA: MIT Press, 2013), pp. 151–82.

10 Cf. Readings in Human-Computer Interaction: Toward the Year 2000, ed. by Ronald M. Baecker et al., 2nd ed. (San Mateo: Morgan Kaufmann, 1995).

11 Pamela Ravasio and Vincent Tscherter, ‘Users’ Theories on the Desktop Metaphor ― or Why We Should Seek Metaphor-Free Interfaces’, in Beyond the Desktop Metaphor: Designing Integrated Digital Work Environments, ed. by Victor Kaptelinin and Mary Czerwinski (Cambridge, MA: MIT Press, 2004), pp. 265–94.

12 Peter Galison, ‘Trading with the Enemy’, in Trading Zones and Interactional Expertise: Creating New Kinds of Collaboration, ed. by Michael E. Gorman (Cambridge, MA: MIT Press, 2010), pp. 39–40.

13 Michael E. Gorman, Lekelia D. Jenkins and Raina K. Plowright, ‘Human Interactions and Sustainability’, in Sustainability: Multi-Disciplinary Perspectives (Sharjah, UAE: Bentham Science Publishers, 2012), pp. 88–111.

14 Franco Moretti, Graphs, Maps, Trees: Abstract Models for Literary History (London: Verso, 2007).

15 Matthew Kirschenbaum, ‘What Is Digital Humanities and What’s it Doing in English Departments?’, in Debates in the Digital Humanities, ed. by Matthew K. Gold (Minneapolis: University of Minnesota Press, 2012), pp. 3–11,

16 Hockey, ‘The History of Humanities Computing’.

17 Dino Buzzetti, E-mail message to the author (10 November 2012); Willard McCarty, Humanities Computing (Basingstoke: Palgrave Macmillan, 2005); John Unsworth, ‘What Is Humanities Computing and What Is Not?’, Jahrbuch für Computerphilologie, 4 (2002),

18 Willard McCarty, ‘Computationalists and Humanists’, Humanist Discussion Group, 2013,

23 Ronald Haentjens Dekker et al., ‘Computer-Supported Collation of Modern Manuscripts: CollateX and the Beckett Digital Manuscript Project’, Digital Scholarship in the Humanities, 30 (2014), 452–70,

25 Robert C. Martin, Agile Software Development, Principles, Patterns, and Practices (Upper Saddle River: Prentice Hall, 2002).

27 Dominic Widdows, ‘Word-Vectors and Search Engines’, in Geometry and Meaning (Stanford: Center for the Study of Language and Information, 2004), pp. 131–265,

28 Karina van Dalen-Oskam, E-mail message to the author (10 January 2014).

29 D. Sculley and Bradley M. Pasanek, ‘Meaning and Mining: The Impact of Implicit Assumptions in Data Mining for the Humanities’, Literary and Linguistic Computing, 23 (2008), 409–24,

31 Charles van den Heuvel, ‘Circulation of Knowledge in the Digital Republic of Letters: Making Correspondences of Manuscripts and Printed Editions Accessible for Research’ (presented at the 5th Liber Manuscript Conference: Promoting Access to Manuscript Content, Paris Bibliothèque Nationale de France, 29–31 May 2012),

32 Peter Wittek and Walter Ravenek, ‘Supporting the Exploration of a Corpus of 17th-Century Scholarly Correspondences by Topic Modeling’, in Supporting Digital Humanities 2011: Answering the Unaskable, ed. by Bente Maegaard (presented at the SDH 2011 Supporting Digital Humanities: Answering the Unaskable, Copenhagen, 2011),

33 Peter Shillingsburg, ‘Is Reliable Social Scholarly Editing an Oxymoron?’, Social, Digital, Scholarly Editing (Saskatoon: University of Saskatchewan, 2013),

34 Vannevar Bush, ‘As We May Think’, The Atlantic (July, 1945), pp. 112–24.

35 W. Boyd Rayward, ‘Visions of Xanadu: Paul Otlet (1868–1944) and Hypertext’, JASIS, 45 (1994), 235–50; Michael Buckland, ‘What Is a “Document”?’, Journal of the American Society of Information Science, 48 (1997), 804–09; Edward Vanhoutte, ‘Paul Otlet (1868–1944) and Vannevar Bush (1890–1974)’, The Mind Tool: Edward Vanhoutte’s Blog, 2009; Charles van den Heuvel and W. Boyd Rayward, ‘Facing Interfaces: Paul Otlet’s Visualizations of Data Integration’, Journal of the American Society for Information Science and Technology, 62 (2011), 2313–26.

36 Theodor Holm Nelson, Literary Machines: The Report on, and of, Project Xanadu Concerning Word Processing, Electronic Publishing, Hypertext, Thinkertoys, Tomorrow’s Intellectual Revolution, and Certain Other Topics Including Knowledge, Education and Freedom (Sausolito: Mindful Press, 1993; first ed. 1981).

37 Tim Berners-Lee, ‘Information Management: A Proposal’ (CERN, 1989),

38 Theodor Holm Nelson, POSSIPLEX: Movies, Intellect, Creative Control, My Computer Life and the Fight for Civilization (Sausolito: Mindful Press, 2010).

39 Hyper/Text/Theory, ed. by George P. Landow (Baltimore: Johns Hopkins University Press, 1994).

40 Peter Robinson, ‘Where We Are with Electronic Scholarly Editions, and Where We Want to Be’, Jahrbuch für Computerphilologie, 5 (2003), 125–46,

41 Ibid.

42 George P. Landow, Hypertext 3.0: Critical Theory and New Media in an Era of Globalization, rev. ed. of Hypertext 2.0 1997 (Baltimore: Johns Hopkins University Press, 2006), p. xiv.

43 John Bryant, The Fluid Text: A Theory of Revision and Editing for Book and Screen (Ann Arbor: University of Michigan Press, 2002),

44 Dirk Van Hulle, Textual Awareness: A Genetic Study of Late Manuscripts by Joyce, Proust, and Mann (Ann Arbor: University of Michigan Press, 2004); Domenico Fiormonte and Cinzia Pusceddu, ‘The Text as a Product and as a Process: History, Genesis, Experiments’, in Manuscript, Variant, Genese—Genesis, ed. by Edward Vanhoutte and M. de Smedt (Gent: KANTL, 2006), pp. 109–28,

45 Katherine N. Hayles, How We Think: Digital Media and Contemporary Technogenesis (Chicago: University of Chicago Press, 2012).

46 Ben Brumfield, ‘The Collaborative Future of Amateur Editions’, Collaborative Manuscript Transcription, 2013,

47 Ray Siemens et al., ‘Toward Modeling the Social Edition: An Approach to Understanding the Electronic Scholarly Edition in the Context of New and Emerging Social Media’, Literary and Linguistic Computing, 27 (2012), 445–61, See also the chapter by Siemens et al. in this book (p. 137).

48 Steven E. Jones, Roberto Busa, S. J., and the Emergence of Humanities Computing (New York and London: Routledge, 2016).

49 Hockey, ‘The History of Humanities Computing’.

50 Dino Buzzetti, ‘Digital Editions and Text Processing’, in Text Editing, Print and the Digital World, ed. by Marilyn Deegan and Kathryn Sutherland (Farnham and Burlington: Ashgate, 2009), pp. 45–61,; Franco Moretti, Graphs, Maps, Trees; Stephen Ramsay, Reading Machines: Toward an Algorithmic Criticism (Urbana-Champaign: University of Illinois Press, 2011); Matthew L. Jockers, Macroanalysis: Digital Methods and Literary History (Urbana-Champaign: University of Illinois Press, 2013).

51 Paul N. Edwards, ‘Hyper Text and Hypertension: Post-Structuralist Critical Theory, Social Studies of Science, and Software’, Social Studies of Science, 24 (1994), 229–78.

52 Cf. e.g. Kenneth M. Price, ‘Electronic Scholarly Editions’, in A Companion to Digital Literary Studies, ed. by Ray Siemens and Susan Schreibman (Oxford: Blackwell, 2008),

55 Franz Fischer, ‘About the HyperStack’, St. Patrick’s Confessio, 2011,

56 Peter Robinson, ‘Towards a Theory of Digital Editions’, Variants, 10 (2013), 105–31.

58 Elena Pierazzo, ‘A Rationale of Digital Documentary Editions’, Literary and Linguistic Computing, 26 (2011), 463–77,

59 Ben Brumfield, ‘The Collaborative Future of Amateur Editions’, Collaborative Manuscript Transcription, 2013,

60 Robinson, ‘Towards a Theory of Digital Editions’.