In September 2017 we started producing XML editions of most of our newly published titles. These new editions consist of an unformatted version of the book content in which textual elements are formally described according to their function within the text (e.g. a bibliographic entry is described as such in XML, rather than as a paragraph). This represents an important step in the process of making our books as accessible and reusable as possible. The XML files are offered for free download in order to encourage wide re-use, such as re-publication and conversion to different formats.
As part of this project we have developed a set of tools to convert epub editions created with Adobe InDesign into XML files that follow the TEI simplePrint schema, an entry-level customization of the TEI Guidelines that has proven particularly well-suited to encoding the vast majority of monographs published by OBP.
The new format makes it possible to programmatically extract information from the book’s content. Thus, we have created a simple tool to extract citation data and are now part of CrossRef’s cited-by program: https://www.crossref.org/services/cited-by/
This code is also available on the OBP GitHub page, at https://github.com/OpenBookPublishers/Extract-citations