overlapping markup
{{Short description|Non-hierarchical interaction of overlapping document markup entities}}
In markup languages and the digital humanities, overlap occurs when a document has two or more structures that interact in a non-hierarchical manner.
A document with overlapping markup cannot be represented as a tree.
This is also known as concurrent markup.
Overlap happens, for instance, in poetry, where there may be a metrical structure of feet and lines; a linguistic structure of sentences and quotations; and a physical structure of volumes and pages and editorial annotations.{{sfn|Text Encoding Initiative}}{{sfn|DeRose|2004|loc=The problem types}}
History
File:Frankenstein.1831.inside-cover.jpg have been analysed with overlapping techniques.{{sfn|Piez|2014}}]]
The problem of non-hierarchical structures in documents has been recognised since 1988; resolving it against the dominant paradigm of text as a single hierarchy (an ordered hierarchy of content objects or OHCO) was initially thought to be merely a technical issue, but has, in fact, proven much more difficult.{{sfn|Renear|Mylonas|Durand|1993}}
In 2008, Jeni Tennison identified markup overlap as "the main remaining problem area for markup technologists".{{sfn|Tennison|2008}}
Markup overlap continues to be a primary issue in the digital study of theological texts in 2019, and is a major reason for the field retaining specialised markup formats—the Open Scripture Information Standard and the Theological Markup Language—rather than the inter-operable Text Encoding Initiative-based formats common to the rest of the digital humanities.{{sfn|MoChridhe|2019}}
Properties and types
A distinction exists between schemes that allow non-contiguous overlap, and those that allow only contiguous overlap. Often, 'markup overlap' strictly means the latter.
Contiguous overlap can always be represented as a linear document with milestones (typically co-indexed start- and end-markers), without the need for fragmenting a (logical) component into multiple physical ones. Non-contiguous overlap may require document fragmentation. Another distinction in overlapping markup schemes is whether elements can overlap with other elements of the same kind (self-overlap).{{sfn|DeRose|2004|loc=The problem types}}
A scheme may have a privileged hierarchy.
Some XML-based schemes, for example, represent one hierarchy directly in the XML document tree, and represent other, overlapping, structures by another means;
these are said to be non-privileged.
{{harvtxt|Schmidt|2012}} identifies a tripartite classification of instances of overlap: 1. "Variation of content and structure", 2. "Overlay of multiple perspectives or markup sets", and 3. "Overlap of individual start and end tags within a single markup perspective";
additionally, some apparent instances of overlap are in fact schema definition problems, which can be resolved hierarchically.
He contends that type 1 is best resolved by a system of multiple documents external to the markup, but types 2 and 3 require dealing with internally.
Approaches and implementations
{{harvtxt|DeRose|2004|loc=Evaluation criteria}} identifies several criteria for judging solutions to the overlap problem:
- readability and maintainability,
- tool support and compatibility with XML,
- possible validation schemes, and
- ease of processing.
Tag soup is, strictly speaking, not overlapping markup—it is malformed HTML, which is a non-overlapping language, and may be ill-defined.
Some web browsers attempted to represent overlapping start and end tags with non-hierarchical Document Object Models (DOM), but this was not standardised across all browsers and was incompatible with the innately hierarchical nature of the DOM.{{sfn|Hickson|2002}}{{sfn|Sivonen|2003}}
HTML5 defines how processors should deal with such mis-nested markup in the HTML syntax and turn it into a single hierarchy.{{sfn|HTML|loc = [https://html.spec.whatwg.org/multipage/syntax.html#an-introduction-to-error-handling-and-strange-cases-in-the-parser § 8.2.8 An introduction to error handling and strange cases in the parser]}}
With XHTML and SGML-based HTML, however, mis-nested markup is a strict error and makes processing by standards-compliant systems impossible.{{sfn|Sperberg-McQueen|Huitfeldt|2000|loc=2.1. Non-SGML Notations}}
The HTML standard defines a paragraph concept which can cause overlap with other elements and can be non-contiguous.{{sfn|HTML|loc = [https://html.spec.whatwg.org/multipage/dom.html#paragraphs § 3.2.5.4 Paragraphs]}}
SGML, which early versions of HTML were based on, has a feature called CONCUR that allows multiple independent hierarchies to co-exist without privileging any.
DTD validation is only defined for each individual hierarchy with CONCUR. Validation across hierarchies is not defined by the standard. CONCUR cannot support self-overlap, and it interacts poorly with some of SGML's abbreviatory features.
This feature has been poorly supported by tools and has seen very little actual use;
using CONCUR to represent document overlap was not a recommended use case, according to a commentary by the standard's editor.{{sfn|Sperberg-McQueen|Huitfeldt|2000|loc=2.2. CONCUR}}{{sfn|DeRose|2004|loc=SGML CONCUR}}
= Within hierarchical languages =
There are several approaches to representing overlap in a non-overlapping language.{{sfn|Di Iorio|Peroni|Vitali|2009}}
The Text Encoding Initiative, as an XML-based markup scheme, cannot directly represent overlapping markup.
All four of the below approaches are suggested.{{sfn|Text Encoding Initiative|loc = [https://tei-c.org/release/doc/tei-p5-doc/en/html/NH.html § 20 Non-hierarchical Structures]}}
The Open Scripture Information Standard is another XML-based scheme, designed to mark up the Bible.
It uses empty milestone elements to encode non-privileged components.{{sfn|Durusau|2006}}
To illustrate these approaches, marking up the sentences and lines of a fragment of Richard III by William Shakespeare will be used as a running example. Where there is a privileged hierarchy, the lines will be used.
== Multiple documents ==
Multiple documents can each provide different internally consistent hierarchies. The advantage of this approach is that each document is simple and can be processed with existing tools, but requires maintenance of redundant content and it can be difficult to cross-reference between different views.{{sfn|Text Encoding Initiative|loc=[https://tei-c.org/release/doc/tei-p5-doc/en/html/NH.html#NHME § 20.1 Multiple Encodings of the Same Information]}} With multiple documents, the overlap can be analysed with data comparison and delta encoding techniques, and, in an XML context, specific XML tree differencing algorithms are available.{{sfn|Schmidt|2009}}{{sfn|La Fontaine|2016}}
{{harvtxt|Schmidt|2012|loc=3.5 Variation}} recommends this approach for encoding multiple variants of a single text and to accept the duplication of the parts which do not vary, rather than attempting to create a structure that represents all of the variation present;
further, he suggests that this alignment be performed automatically, and that misalignment is rare in practice.{{sfn|Schmidt|2012|loc=4.1 Automating Variation}}
Example, with lines marked up:
With sentences marked up:
Who prays continually for Richmond's good.
And flaky darkness breaks within the east.
== Milestones ==
Milestones are empty elements that mark the beginning and end of a component, typically using the XML ID mechanism to indicate which "begin" element goes with which "end" element. Milestones can be used to embed a non-privileged structure within a hierarchical language, In their basic form they can only represent contiguous overlap. Generic XML can of course parse the milestone elements, but do not understand their special meaning and so cannot easily process or validate the non-privileged structure.{{sfn|Text Encoding Initiative|loc=[https://tei-c.org/release/doc/tei-p5-doc/en/html/NH.html#NHBM § 20.2 Boundary Marking with Empty Elements]}}{{sfn|Sperberg-McQueen|Huitfeldt|2000|loc=2.4. Milestones}}
Milestone have the advantage that the markup for overlapping elements is located right at the relevant boundaries, like other markup. This is an advantage for maintainability and readability.{{sfn|DeRose|2004|loc=TEI-style milestones}} CLIX {{harv|DeRose|2004}} is an example of such an approach.
Example:
Punctuation and spaces have been identified as a type of milestone-style 'crypto-overlap' or 'pseudo-markup', as the boundaries of words, clauses, sentences and the like do not necessarily align with the formal markup boundaries hierarchically.{{sfn|Birnbaum|Thorsen|2015}}{{sfn|Haentjens Dekker|Birnbaum|2017}}
It is also possible to use more complex milestones to represent non-contiguous structures. For example, TAGML's "suspend" and "resume" semantic{{sfn|Dekker|2018}} can be expressed using milestones, for example by adding an attribute to indicate whether each milestone represents a start, suspend, resume, or end point. Re-ordering and even self-overlap can be achieved similarly, by annotating each milestone with a "next chunk" reference.
== Joins ==
Joins are pointers within a privileged hierarchy to other components of the privileged hierarchy, which may be used to reconstruct a non-privileged component akin to following a linked list. A single non-privileged element is segmented into several partial elements within the privileged hierarchy; the partial elements themselves do not represent a single unit in the non-privileged hierarchy, which can be misleading and make processing difficult.{{sfn|Text Encoding Initiative|loc=[https://tei-c.org/release/doc/tei-p5-doc/en/html/NH.html#NHVE § 20.3 Fragmentation and Reconstitution of Virtual Elements]}}{{sfn|DeRose|2004|loc=Segmentation}} While this approach can support some discontiguous structures, it is not able to re-order elements.{{sfn|Sperberg-McQueen|Huitfeldt|2000|loc=2.5. Fragmentation}} A slightly different approach can, however, express re-ordering by expressing the join away from the content, at the cost of directness and maintainability.{{sfn|DeRose|2004|loc=Joins}}
Join-based representations can introduce the possibility of cycles between elements; detecting and rejecting these adds complexity to implementations.{{sfn|Schmidt|2012|loc=3.4 Interlinking}}
Example:
== Stand-off markup ==
Stand-off markup is similar to using joins, except that there may be no privileged hierarchy: each part of the document is given a label (or might be referred to by an offset), and the document structure is expressed by pointing to the content from markup that 'stands off' from the content (possibly in an entirely different file), and might contain no content itself. The TEI guidelines identify the unity of the elements as a primary advantage of stand-off markup over joins, in addition to the ability to produce and distribute annotations separately from the text, possibly even by different authors applying markup to a read-only document,{{sfn|Text Encoding Initiative|loc= [https://tei-c.org/release/doc/tei-p5-doc/en/html/NH.html#NHSO § 20.4 Stand-off Markup]}} allowing collaborative approaches to markup by a divide and conquer strategy.{{sfn|Schmidt|2012|loc=4.2 Markup Outside the Text}}
Example:
I, by attorney, bless thee from thy mother,
Who prays continually for Richmond's good.
So much for that.—The silent hours steal on,
And flaky darkness breaks within the east.
...
It has been claimed that separating markup and text can result in overall simplification and increased maintainability,{{sfn|Eggert|Schmidt|2019|loc=Conclusion}} and by 2017, ``[t]he current state of the art to [represent] (...) linguistically annotated data is to use a graph-based representation serialized as standoff XML as a pivot format´´,{{sfn|Ide|Chiarcos|Stede|Cassidy|2017|loc=p.99}} i.e., that standoff was the most widely accepted approach to address the overlapping markup challenge.
Standoff formalisms have been the basis for an ISO standard for linguistic annotation,{{Cite web|url=https://www.iso.org/standard/37326.html|title=ISO 24612:2012|website=ISO}} they have been successfully applied for developing corpus management systems,{{sfn|Chiarcos|Dipper|Götze|Leser|2008|}} and (as of April 2020) they are actively being developed in the TEI.{{cite web|url=https://github.com/TEIC/TEI/issues/1745|title=Standoff: Annotation microstructure · Issue #1745 · TEIC/TEI|website=GitHub}} One published example of a successful stand-off annotation scheme was developed as part of a bitext natural language documentation project focused on the preservation of low-resource or endangered languages.Xia, F., Lewis, W.D., Goodman, M.W. et al. Enriching a massively multilingual database of interlinear glossed text. Lang Resources & Evaluation 50, 321–349 (2016). https://doi.org/10.1007/s10579-015-9325-4
== Challenges ==
Representing overlapping markup within hierarchical languages is challenging, for reasons of redundancy and/or complexity. In the 2000s to 2010s, standoff formalisms were generally accepted as the most promising approach here,{{sfn|Ide|Chiarcos|Stede|Cassidy|2017|loc=p.99}} but a disadvantage of standoff is that validation is very challenging.{{sfn|Sperberg-McQueen|Huitfeldt|2000|loc=2.6. Standoff Markup}}
Standoff formalisms are not natively supported by database management systems, so that (by 2017) it was suggested to ``use ... standoff XML as a pivot format (...) and relational data bases for querying.´´{{sfn|Ide|Chiarcos|Stede|Cassidy|2017|loc=p.99}} In practical applications, this requires complicated architectures and/or labor-intense transformation between pivot format and internal representation. As a result, maintenance is problematic.{{sfn|DeRose|2004|loc=Standoff markup}} This has been a motivation to develop corpus management systems on the basis of graph data bases and for using established graph-based formalisms as pivot formats.
= Special-purpose languages =
For implementing the above-mentioned strategies, either existing markup languages (such as the TEI) can be extended or special-purpose languages can be designed. To design an entirely new markup language allow to forego{{Incomprehensible inline|date=July 2023|reason=ungrammatical}} the tool support in existing languages for a less complicated semantic model and more convenient syntax.
== Historical formalisms ==
- LMNL is a non-hierarchical markup language first described in 2002 by Jeni Tennison and Wendell Piez, annotating ranges of a document with properties and allowing self-overlap. CLIX, which originally stood for 'Canonical LMNL In XML', provides a method for representing any LMNL document in a milestone-style XML document.{{sfn|DeRose|2004|loc=CLIX and LMNL}} It also has another XML serialisation, xLMNL.{{sfn|Piez|2012}}
- MECS was developed by the University of Bergen's Wittgenstein Archive. However, it had several problems: it allowed some non-sensical documents of overlapping elements, it could not support self-overlap, and it did not have the capacity to define a DTD-like grammar.{{sfn|Sperberg-McQueen|Huitfeldt|2000|loc=2.7. MECS}} The theory of General Ordered-Descendant Directed Acyclic Graphs (GODDAGs), while not strictly a markup language itself, is a general data model for non-hierarchical markup. Restricted GODDAGs were designed specifically to match the semantics of MECS; general GODDAGs may be non-contiguous and need a more powerful language.{{sfn|Sperberg-McQueen|Huitfeldt|2000}} TexMECS is a successor to MECS, which has a formal grammar and is designed to represent every GODDAG and nothing that is not a GODDAG.{{sfn|Huitfeldt|Sperberg-McQueen|2003}}
- XCONCUR (previously MuLaX) is a melding-together of XML and SGML's CONCUR, and also contains a validation language, XCONCUR-CL, and a SAX-like API.{{sfn|Hilbert|Schonefeld|Witt|2005}}{{sfn|Witt|Schonefeld|Rehm|Khoo|2007}}{{sfn|Schonefeld|2008}}
- Marinelli, Vitali and Zacchiroli provide algorithms to convert between restricted GODDAGs, ECLIX, LMNL, parallel documents in XML, contiguous stand-off markup and TexMECS.{{sfn|Marinelli|Vitali|Zacchiroli|2008}}
None of these formalisms seem to be maintained anymore. Consensus community seems to be to employ standoff XML or graph-based formalisms.
== Actively maintained standoff XML languages ==
- GrAF-XML,{{cite web|url=https://sourceforge.net/projects/iso-graf/|title=ISO GrAF|date=7 March 2015 }} standoff-XML serialization of the Linguistic Annotation Framework (LAF), used, e.g., for the American National Corpus{{cite web |url=http://www.anc.org/ |title=Home |website=anc.org}}
- PAULA-XML,{{Cite web | url=https://www.sfb632.uni-potsdam.de/en/paula.html | title=PAULA XML: Interchange Format for Linguistic Annotations | archive-url=https://web.archive.org/web/20200817223328/https://www.sfb632.uni-potsdam.de/en/paula.html | archive-date=2020-08-17}} standoff-XML serialization of the data model underlying the corpus management system ANNIS and the converter suite SALT{{cite web |url=https://corpus-tools.org/salt/ |title=Salt |doi=10.5281/zenodo.17557 |publisher=corpus-tools.org |date=2016-11-18 |access-date=2022-09-11|last1=Zipser |first1=Florian }}
- NAF (NLP Annotation Format / Newsreader Annotation Format),{{cite web|url=https://github.com/newsreader/NAF|title=NAF|website=GitHub|date=30 June 2021}} standoff XML format originally developed in the NewsReader project (FP7, 2013-2015{{cite web |url=https://cordis.europa.eu/project/id/316404 |title=Building structured event indexes of large volumes of financial and economic data for decision making |website=Community Research and Development Information Service (CORDIS) }}), currently used by NLP tools such as FreeLing{{cite web |url=http://nlp.lsi.upc.edu/freeling/ |title=Home - FreeLing Home Page |access-date=2020-04-06 |archive-url=https://web.archive.org/web/20120429043610/http://nlp.lsi.upc.edu/freeling/ |archive-date=2012-04-29 |url-status=dead }} (with support for English, Spanish, Portuguese, Italian, French, German, Russian, Catalan, Galician, Croatian, Slovene, etc.), and EusTagger{{cite web | url=http://www.hitz.eus/en/nlp | title=Text Analysis | HiTZ Zentroa }} (with support for Basque, English, Spanish).
- The Charles Harpur Critical Archive is encoded using 'multi-version documents' (MVD) to represent the variant versions of documents and as a means of indicating additions, deletions and revisions using a tactical combination of multiple documents and stand-off ranges within an underlying graph-based model. MVD is presented as an application file format, requiring specialised tools to view or edit.{{sfn|Eggert|Schmidt|2019}}
- A standoff XML scheme was developed by the Odin, Intent, and XigtEdit collaboration, which is focused on a large dataset of Interlinear Glossed Text (IGT) for supporting natural language resource and documentation projects.
Standoff approaches have two parts, commonly called the "content" and the "annotations." These can be expressed in unrelated representations. Simple standoff annotations per se, involve no more than a list of (location, type) pairs. Thus, in a few applications{{Examples|date=July 2020}} standoff annotations are expressed in CSV, JSON(-LD, or other representations. (e.g., Web Annotation{{cite web|url=https://www.w3.org/TR/annotation-model/|title=Web Annotation Data Model|date=23 February 2017 }}) or graph formalisms grounded in string URIs (see below). However, representing and validating content in such representations is much more difficult and much less common.
= Graph-based formalisms =
Standoff markup employs a data model based on directed graphs,{{sfn|Ide|Suderman|2007}} thus complicating its representation when grounding markup information in a tree. Representing overlapping hierarchies in a graph eliminates this challenge. Standoff annotations can thus be more adequately represented as generalised directed multigraphs and use formalisms and technologies developed for this purpose, most notably those based on the Resource Description Framework (RDF).{{sfn|Cassidy|2010|loc=cassidy}}{{sfn|Chiarcos|2012|loc=POWLA}}
EARMARK is an early RDF/OWL representation that encompasses General Ordered-Descendant Directed Acyclic Graphs (GODDAGs).{{sfn|Di Iorio|Peroni|Vitali|2009}} The theory of GODDAGs, while not strictly a markup language itself, is a general data model for non-hierarchical markup.
RDF is a semantic data model that is linearization-independent, and it provides different linearisations, including an XML format (RDF/XML) that can be modeled to mirror standoff XML, a linearisation that lets RDF be expressed in XML attributes (RDFa), a JSON format (JSON-LD), and binary formats designed to facilitate querying or processing (RDF-HDT,{{cite web |url=http://www.rdfhdt.org/ |title=Home |website=rdfhdt.org}} RDF-Thrift{{Cite web|url=https://afs.github.io/rdf-thrift/|title=RDF Binary using Apache Thrift|website=afs.github.io}}). RDF is semantically equivalent to graph-based data models underlying standoff markup; it does not require special-purpose technology for storing, parsing and querying. Multiple interlinked RDF files representing a document or a corpus constitute an example of Linguistic Linked Open Data.
An established technique to link arbitrary graphs with an annotated document is to use URI fragment identifiers to refer to parts of a text and/or document, see overview under Web annotation. The Web Annotation standard provides format-specific `selectors' as an additional means, e.g., offset-, string-match- or XPath-based selectors.{{cite web|url=https://w3c.github.io/web-annotation/selector-note/|title=Selectors and States|date=23 February 2017 }}
Native RDF vocabularies capable to represent linguistic annotations include:{{cite book |last1=Cimiano |first1=Philipp |last2=Chiarcos |first2=Christian |last3=McCrae |first3=John P. |last4=Gracia |first4=Jorge |title=Linguistic Linked Data. Representation, Generation and Applications |date=2020 |publisher=Springer |location=Cham }}
- Web Annotation{{cite journal |last1=Verspoor |first1=Karin |author-link=Karin Verspoor |last2=Livingston |first2=Kevin |date=2012 |title=Towards Adaptation of Linguistic Annotations to Scholarly Annotation Formalisms on the Semantic Web |url=https://www.aclweb.org/anthology/W12-3610/ |journal=Proceedings of the Sixth Linguistic Annotation Workshop, Jeju, Republic of Korea |pages=75–84 |access-date=6 April 2020}}
- NLP Interchange Format (NIF){{cite web|url=https://persistence.uni-leipzig.org/nlp2rdf/|title = NLP Interchange Format (NIF) 2.0 - Overview and Documentation}}
- LAPPS Interchange Format (LIF){{cite web|url=https://wiki.lappsgrid.org/interchange/overview.html|title = LIF Overview}}
Related vocabularies include
- POWLA, an OWL2/DL serialization of PAULA-XML{{cite web|url=http://purl.org/powla|title=POWLA|date=January 2022}}
- RDF-NAF, an RDF serialization of the NLP Annotation Format{{Cite web|url=http://wordpress.let.vupr.nl/naf/|title=NLP Annotation Format | Background information on NAF}}
In early 2020, W3C Community Group LD4LT has launched an initiative to harmonize these vocabularies and to develop a consolidated RDF vocabulary for linguistic annotations on the web.{{cite web|url=https://github.com/ld4lt/linguistic-annotation|title = Towards a consolidated LOD vocabulary for linguistic annotations|website = GitHub|date = 7 September 2021}}
Notes
{{reflist|30em}}
References
- {{ cite conference
| chapter = Markup and meter: Using XML tools to teach a computer to think about versification
| first1 = David J | last1 = Birnbaum | first2 = Elise | last2 = Thorsen
| title = Proceedings of Balisage: The Markup Conference 2015 | date = 2015
| volume = 15 | chapter-url = http://www.balisage.net/Proceedings/vol15/html/Birnbaum01/BalisageVol15-Birnbaum01.html
| conference = Balisage: The Markup Conference 2015 | location = Montréal
| doi = 10.4242/BalisageVol15.Birnbaum01
| isbn = 978-1-935958-11-6 }}
- {{ cite conference | last = Cassidy | first = Steve | title = An RDF realisation of LAF in the DADA annotation server | conference = Proceedings of ISA-5 | url = http://web.science.mq.edu.au/~cassidy/wordpress/wp-content/uploads/2006/07/paper.pdf | location = Hong Kong | date = 2010 | citeseerx = 10.1.1.454.9146 }}
- {{ cite conference | first = Christian | last = Chiarcos | title = The Semantic Web: Research and Applications | chapter = POWLA: Modeling linguistic corpora in OWL/DL | series = Lecture Notes in Computer Science | date = 2012 | volume = 7295 | access-date = 2016-05-24 | conference = Proceedings of the 9th Extended Semantic Web Conference (ESWC 2012, Heraklion, Crete; LNCS 7295) | pages = 225–239 | chapter-url = http://acoli.cs.uni-frankfurt.de/bibtex/papers/chiarcos-2012-powla-eswc.pdf | doi = 10.1007/978-3-642-30284-8_22 | isbn = 978-3-642-30283-1 | doi-access = free }}
- {{cite journal |last1=Chiarcos |first1=Christian |last2=Dipper |first2=Stefanie |last3=Götze |first3=Michael |last4=Leser |first4=Ulf |last5=Lüdeling |first5=Anke |last6=Ritz |first6=Julia |last7=Stede |first7=Manfred |title=A flexible framework for integrating annotations from different tools and tagsets |journal=Traitement Automatique des Langues |date=2008 |volume=49 |issue=2 |pages=271–293 |url=https://www.atala.org/content/flexible-framework-integrating-annotations-different-tools-and-tag-sets}}
- {{ cite conference
| chapter = TAGML: A markup language of many dimensions
| conference = Balisage: The Markup Conference 2018 | location = Rockville, MD
| first1 = Ronald Haentjens | last1 = Dekker
| first2 = Elli | last2 = Bleeker
| first3 = Bram | last3 = Buitendijk
| first4 = Astrid | last4 = Kulsdom
| first5 = David J | last5 = Birnbaum
| title = Proceedings of Balisage: The Markup Conference 2018 | date = 2018
| volume = 21 | chapter-url = https://www.balisage.net/Proceedings/vol21/html/HaentjensDekker01/BalisageVol21-HaentjensDekker01.html
| doi = 10.4242/BalisageVol21.HaentjensDekker01
| isbn = 978-1-935958-18-5 | doi-access = free}}
- {{cite conference | first = Steven | last = DeRose | author-link = Steven DeRose | year = 2004 | title = Markup Overlap: A Review and a Horse | url = http://conferences.idealliance.org/extreme/html/2004/DeRose01/EML2004DeRose01.html | conference = Extreme Markup Languages 2004 | location = Montréal | citeseerx = 10.1.1.108.9959 | access-date = 2014-10-14 | archive-date = 2014-10-17 | archive-url = https://web.archive.org/web/20141017175806/http://conferences.idealliance.org/extreme/html/2004/DeRose01/EML2004DeRose01.html | url-status = dead }}
- {{ cite conference
| chapter = Towards markup support for full GODDAGs and beyond: the EARMARK approach
| first1 = Angelo | last1 = Di Iorio | first2 = Silvio | last2 = Peroni | first3 = Fabio | last3 = Vitali
| title = Proceedings of Balisage: The Markup Conference 2009 | date = August 2009
| volume = 3 | chapter-url = http://www.balisage.net/Proceedings/vol3/html/Peroni01/BalisageVol3-Peroni01.html
| conference = Balisage: The Markup Conference 2009 | location = Montréal | doi = 10.4242/BalisageVol3.Peroni01
| isbn = 978-0-9824344-2-0 }}
- {{ cite journal | first1 = Paul | last1 = Eggert | first2 = Desmond A | last2 = Schmidt | date = 2019 | url = https://ecommons.luc.edu/english_facpubs/55/ | title = The Charles Harpur Critical Archive: A History and Technical Report | journal = International Journal of Digital Humanities | volume = 1 | number = 1 | access-date = 2019-03-25 }}
- {{ cite conference
| chapter = It's more than just overlap: Text As Graph
| first1 = Ronald | last1 = Haentjens Dekker | first2 = David J | last2 = Birnbaum
| title = Proceedings of Balisage: The Markup Conference 2017 | date = 2017
| volume = 19 | chapter-url = https://www.balisage.net/Proceedings/vol19/html/Dekker01/BalisageVol19-Dekker01.html
| conference = Balisage: The Markup Conference 2017 | location = Montréal
| doi = 10.4242/BalisageVol19.Dekker01
| isbn = 978-1-935958-15-4 | doi-access = free }}
- {{ cite book | title = OSIS Users Manual (OSIS Schema 2.1.1) | first = Patrick | last = Durusau | url = http://img.forministry.com/7/7B/7BB51FB8-84B3-4FF3-939ED473FA90A632/DOC/OSIS2_1UserManual_06March2006_-_with_O%27Donnell_edits.PDF | date = 2006 | archive-url = https://web.archive.org/web/20141023105349/http://img.forministry.com/7/7B/7BB51FB8-84B3-4FF3-939ED473FA90A632/DOC/OSIS2_1UserManual_06March2006_-_with_O'Donnell_edits.PDF | archive-date = 2014-10-23 | access-date = 2014-10-14 }}
- {{cite web | url = http://ln.hixie.ch/?count=1&start=1037910467 | title = Tag Soup: How UAs handle <x> <y> </x> </y> | author = Ian Hickson | date = 2002-11-21 | access-date = 2017-11-05 | author-link = Ian Hickson }}
- {{ cite conference | first1 = Mirco | last1 = Hilbert | first2 = Oliver | last2 = Schonefeld | first3 = Andreas | last3 = Witt
| title = Making CONCUR work
| url = http://conferences.idealliance.org/extreme/html/2005/Witt01/EML2005Witt01.xml | access-date = 2014-10-14
| conference = Extreme Markup Languages 2005 | date = 2005 | location = Montréal
| citeseerx = 10.1.1.104.634
}}
- {{cite web | title = TexMECS: An experimental markup meta-language for complex documents | first1 = Claus | last1 = Huitfeldt | first2 = C M | last2 = Sperberg-McQueen | url = http://mlcd.blackmesatech.com/mlcd/2003/Papers/texmecs.html | archive-url = https://web.archive.org/web/20170227202055/http://mlcd.blackmesatech.com/mlcd/2003/Papers/texmecs.html | archive-date = 2017-02-27 | date = 2003 | access-date = 2014-10-14 }}
- {{cite book |last1=Ide |first1=Nancy |last2=Chiarcos |first2=Christian |last3=Stede |first3=Manfred |last4=Cassidy |first4=Steve |chapter=Designing Annotation Schemes: From Model to Representation |title=Handbook of Linguistic Annotation |editor1-first=Nancy |editor1-last=Ide |editor2-first=James |editor2-last=Pustejovsky |page=99 |date=2017 |doi=10.1007/978-94-024-0881-2_3 |publisher=Springer |location=Dordrecht|isbn=978-94-024-0879-9 }}
- {{ cite conference
| chapter = Representing Overlapping Hierarchy as Change in XML
| first1 = Robin | last1 = La Fontaine
| title = Proceedings of Balisage: The Markup Conference 2016 | date = 2016
| volume = 17 | chapter-url = http://www.balisage.net/Proceedings/vol17/html/LaFontaine01/BalisageVol17-LaFontaine01.html
| conference = Balisage: The Markup Conference 2016 | location = Montréal
| doi = 10.4242/BalisageVol17.LaFontaine01
| isbn = 978-1-935958-13-0 }}
- {{ cite journal | first1 = Paolo | last1 = Marinelli | first2 = Fabio | last2 = Vitali | first3 = Stefano | last3 = Zacchiroli | author3-link = Stefano Zacchiroli
| title = Towards the unification of formats for overlapping markup
| url = http://upsilon.cc/~zack/research/publications/nrhm-overlapping-conversions.pdf
| access-date = 2014-10-14 | journal = New Review of Hypermedia and Multimedia | date = January 2008 | volume = 14 | issue = 1 | pages = 57–94 | issn = 1361-4568 | doi = 10.1080/13614560802316145
| citeseerx = 10.1.1.383.1636 | s2cid = 16909224 }}
- {{ cite journal | last = MoChridhe | first = Race J
| title = Twenty Years of Theological Markup Languages: A Retro- and Prospective
| journal = Theological Librarianship | volume = 12 | issue = 1
| issn = 1937-8904
| url = https://0-theolib-atla-com.librarycatalog.vts.edu/theolib/article/view/523
| doi = 10.31046/tl.v12i1.523
| date = 2019-04-24
| s2cid = 171582852
| access-date = 2019-07-15
| doi-access = free
}}
- {{ cite conference | first = Wendell | last = Piez | title = Proceedings of Balisage: The Markup Conference 2012 | date = August 2012 | chapter = Luminescent: parsing LMNL by XSLT upconversion | volume = 8 | conference = Balisage: The Markup Conference 2012 | location = Montréal | doi = 10.4242/BalisageVol8.Piez01 | isbn = 978-1-935958-04-8 | chapter-url = http://www.balisage.net/Proceedings/vol8/html/Piez01/BalisageVol8-Piez01.html | access-date = 2014-10-14 | doi-access = free }}
- {{ cite conference
| title = Hierarchies within range space: From LMNL to OHCO
| first1 = Wendell | last1 = Piez
| date = 2014
| url = https://www.balisage.net/Proceedings/vol13/html/Piez01/BalisageVol13-Piez01.html
| conference = Balisage: The Markup Conference 2014 | location = Montréal
| doi = 10.4242/BalisageVol13.Piez01
}}
- {{Cite web | last1 = Renear | first1 = Allen | last2 = Mylonas | first2 = Elli | last3 = Durand | first3 = David | date = 1993-01-06 | url = http://cds.library.brown.edu/resources/stg/monographs/ohco.html | title = Refining our Notion of What Text Really Is: The Problem of Overlapping Hierarchies | access-date = 2016-10-02 | citeseerx = 10.1.1.172.9017 | hdl = 2142/9407 | archive-date = 2021-03-23 | archive-url = https://web.archive.org/web/20210323204039/http://cds.library.brown.edu/resources/stg/monographs/ohco.html | url-status = dead }}
- {{ cite conference | first = Oliver | last = Schonefeld
| title = A Simple API for XCONCUR: Processing concurrent markup using an event-centric API
| url = http://www.balisage.net/Proceedings/vol1/html/Schonefeld01/BalisageVol1-Schonefeld01.html | doi = 10.4242/BalisageVol1.Schonefeld01 | access-date = 2014-10-14
| conference = Balisage: The Markup Conference 2008 | date = August 2008 | location = Montréal
}}
- {{ Cite book
| chapter-url = http://cmsmcq.com/2000/poddp2000.html
| first1 = C M | last1 = Sperberg-McQueen | author1-link = Michael Sperberg-McQueen | first2 = Claus | last2 = Huitfeldt | title = Digital Documents: Systems and Principles
| chapter = GODDAG: A Data Structure for Overlapping Hierarchies
| series = Lecture Notes in Computer Science
| date = 2004
| access-date = 2014-10-14 | volume = 2023
| issue = 2023
| pages = 139–160
| doi = 10.1007/978-3-540-39916-2_12
| isbn = 978-3-540-21070-2
}}
- {{ cite conference
| title = Merging Multi-Version Texts: a General Solution to the Overlap Problem
| first1 = Desmond | last1 = Schmidt
| chapter = Merging Multi-Version Texts: A Generic Solution to the Overlap Problem | series = Proceedings of Balisage: The Markup Conference 2009 | date = 2009
| volume = 3 | url = https://www.balisage.net/Proceedings/vol3/html/Schmidt01/BalisageVol3-Schmidt01.html
| conference = Balisage: The Markup Conference 2009 | location = Montréal
| doi = 10.4242/BalisageVol3.Schmidt01
| isbn = 978-0-9824344-2-0 }}
- {{ cite journal
| title = The role of markup in the digital humanities
| first1 = Desmond | last1 = Schmidt
| date = 2012
| journal = Historical Social Research | volume = 27 | issue = 3 | pages = 125–146
| doi = 10.12759/hsr.37.2012.3.125-146
}}
- {{cite web | url = https://hsivonen.fi/soup-dom/ | title = Tag Soup: How Mac IE 5 and Safari handle <x> <y> </x> </y> | author = Henri Sivonen | date = 2003-08-16 | access-date = 2017-11-05 }}
- {{ cite conference | last1 = Ide | first1 = Nancy | last2 = Suderman | first2 = Keith | date= 2007 | title = GrAF: A graph-based format for linguistic annotations | conference = Proceedings of the First Linguistic Annotation Workshop (LAW-2007, Prague, Czech Republic) | pages = 1–8 | url = http://www.aclweb.org/old_anthology/W/W07/W07-15.pdf#page=15 | citeseerx = 10.1.1.146.4543 }}
- {{cite web | last1 = Tennison | first1 = Jenni | date = 2008-12-06 | url = http://www.jenitennison.com/2008/12/06/overlap-containment-and-dominance.html | title = Overlap, Containment and Dominance | access-date = 2016-10-02 }}
- {{cite conference | first1 = Andreas | last1 = Witt | first2 = Oliver | last2 = Schonefeld | first3 = Georg | last3 = Rehm | first4 = Jonathan | last4 = Khoo | first5 = Kilian | last5 = Evang | title = On the Lossless Transformation of Single-File, Multi-Layer Annotations into Multi-Rooted Trees | url = http://conferences.idealliance.org/extreme/html/2007/Witt01/EML2007Witt01.xml | access-date = 2014-10-14 | conference = Extreme Markup Languages 2007 | date = 2007 | location = Montréal | archive-date = 2014-10-17 | archive-url = https://web.archive.org/web/20141017213225/http://conferences.idealliance.org/extreme/html/2007/Witt01/EML2007Witt01.xml | url-status = dead }}
- {{cite web | title = Guidelines for Electronic Text Encoding and Interchange | edition = 5 | author = Text Encoding Initiative Consortium | date = 16 September 2014 | access-date = 2014-10-14 | url = http://www.tei-c.org/release/doc/tei-p5-doc/en/html/ | ref = {{sfnref|Text Encoding Initiative}} }}
- {{cite web | title = HTML Living Standard | author = WHATWG | access-date = 2019-03-25 | url = https://html.spec.whatwg.org/multipage/ | ref = {{sfnref|HTML}} | author-link = WHATWG }}