The Project Gutenberg EBook of The Guide to PGTEI by Marcello Perathoner
This eBook is for the use of anyone anywhere at no cost and with almost no restrictions whatsoever. You may copy it, give it away or re-use it under the terms of the Project Gutenberg License included with this eBook or online at http://www.gutenberg.org/license
Title: The Guide to PGTEI Author: Marcello Perathoner Release Date: April 12, 2005 [EBook #20000] Language: English ***START OF THIS PROJECT GUTENBERG EBOOK THE GUIDE TO PGTEI***
This document does not represent any official PG standard. It describes the ‘dialect’ of TEI used by the Gnutenberg Press, which is part of the online text conversion system which is being installed on www.gutenberg.org.
The Gnutenberg Press will convert any TEI conformant text, but texts marked up according to this guide will look better and contain all the necessary headers and footers for posting on PG.
To mark up a text means to identify its components according to a set of rules.
Basically you say: this (point in the text) is the start of a paragraph; this is the end of a paragraph; this is the start of a chapter; this the end, etc.
In TEI-speak a text component is referred to as element. Paragraphs, chapters, highlighted words, quotes, footnotes, etc. are such elements.
To mark a text region as element you have to insert an opening tag at the start and a closing tag at the end of the text region. In TEI a paragraph is represented by an element of type p. You type an opening tag by enclosing the element type with brackets: <p>. A closing tag has a slash after the first bracket: </p>.
Here's an example of how you would mark up a paragraph:
<p>'Oh, bless you, it doesn't matter in the least. If the man is caught, it will be _on account_ of their exertions; if he escapes, it will be _in spite_ of their exertions. It's heads I win and tails you lose.'</p>
Don't worry about the line breaks, the text will get reformatted anyway. The formatter knows where a paragraph ends by the </p> tag and does not care about empty lines and such.
Let's do some more markup. In TEI the <emph> element stands for emphasized text:
<p>'Oh, bless you, it doesn't matter in the least. If the man is caught, it will be <emph>on account</emph> of their exertions; if he escapes, it will be <emph>in spite</emph> of their exertions. It's heads I win and tails you lose.'</p>
In TEI the <q> element stands for quoted text:
<p><q>Oh, bless you, it doesn't matter in the least. If the man is caught, it will be <emph>on account</emph> of their exertions; if he escapes, it will be <emph>in spite</emph> of their exertions. It's heads I win and tails you lose.</q></p>
Every opening tag needs a corresponding closing tag. Opening and closing tags must always nest like parentheses in a mathematical equation.
This is right:
<q><emph> ... </emph></q>
and this is wrong:
<q><emph> ... </q></emph>
Most elements can take attributes. Attributes are used to add specifications to elements in exactly the same way in which adjectives add specifications to nouns.
<p> ... Above all, why should the second man write up the German word <foreign lang="de">Rache</foreign> before decamping? ... </p>
In TEI the <foreign> element is used to mark a passage in a foreign language. The lang attribute specifies which language it is.
The attribute name must be followed by an = and the attribute value must be put in quotes. An element can have zero or more attributes but every attribute must have a different name.
<p> ... Above all, why should the second man write up the German word <foreign lang="de" rend="italic">Rache</foreign> before decamping? ... </p>
In TEI you can specify characters you don't have on your keyboard with entities. Let's see how to insert the em-dash character, that is the long dash you see in printed books. (In PG etexts that character is mostly represented by two dashes -- because ASCII lacks that character.)
<q> ... Among other things I bought these brown boots — gave six dollars for them — and had one stolen before ever I had them on my feet.</q>
In TEI the entity — represents an em-dash. Substituting — for -- makes the text look more professional.
Entities start with an ampersand (&) and end with a semicolon (;). You can find a list of supported TEI entities in Chapter 18.
You can and should mark up a text incrementally. That is: make more than one pass over the whole text and in each pass mark up a subset of elements.
You may start marking only the most prominent text features like chapters and paragraphs. Later you make a second pass marking all italicized text. If you still want to do more, make another pass replacing all quotation marks with the <q> element.
TODO: a PG working group needs to codify different ‘levels’ of PGTEI markup.
Most probably you will start with a TEI text automatically generated by a some program from the plain vanilla etext. Your task will then be to proof the tags inserted by the program.
If you cannot state with confidence the reason why a text passage is highlighted, use the generic <hi> tag. A person more knowledgeable than you can easily make another pass over the text searching for all generic tags and replacing them with more appropriate specific tags, eg. the <emph> or <title> tags.
If you encounter a passage in a foreign language unknown to you just use the bare <foreign>. Another person who knows the language may add the lang attribute.
You can insert comments any place you want. These will stay in the TEI text but not show up in the formatted output. By using the word: FIXME you can mark positions that require further inspection.
A comment starts with <!-- and ends with -->.
<p><q>Oh, bless you, it doesn't matter in the least. If the man is caught, it will be <emph>on account</emph> of their exertions; if he escapes, it will be <emph>in spite</emph> of their exertions. It's heads I win and tails you lose. Whatever they do, they will have followers. <!-- FIXME: provide a footnote with a translation --> <quote lang="fr">Un sot trouve toujours un plus sot qui l'admire.</quote></q></p>
Later it will be easy to search for all FIXME in the text and fix them.
One of the advantages of XML is that a program can check the markup for you. To do this you need a validator and the DTD (Document Type Definition).
You can get XML validators from here:
And here is the PGTEI DTD.
For all of you who don't want to install a validator on your own PC there is an online validation service for PGTEI. It can also convert your text to different output formats.
As primary source of information refer to TEI Lite: An Introduction to Text Encoding for Interchange by and , June 1995, revised May 2002.
A still smaller subset of TEI is described in: Bare Bones TEI A Very Very Small Subset of the TEI Encoding Scheme by , Document No. TEI U6, 30 Aug 1994, revised June 1995.
The complete TEI markup language (caveat emptor) is described in: TEI P4: Guidelines for Electronic Text Encoding and Interchange by , and editors, 2002.
The homepage of the Text Encoding Initiative Consortium has many other interesting stuff and links.
Language Codes: Code for the Representation of the Names of Languages. From ISO 639, revised 1989
The rest of this guide explains the implementation details and limitations of the pg-press system and shows more examples. Numbered headers refer to the corresponding section in the TEI Lite Introduction.
These are examples for the official header and footer in a PGTEI text. The <publicationStmt> section is mandatory.
<?xml version="1.0" encoding="iso-8859-1" ?> <!DOCTYPE TEI.2 SYSTEM "http://www.gutenberg.org/tei/marcello/0.3/dtd/pgtei.dtd"> <TEI.2 lang="en"> <teiHeader> <fileDesc> <titleStmt> <title>Alice's Adventures in Wonderland</title> <respStmt><resp>Illustrated by</resp> <name>John Tenniel</name></respStmt> <author><name reg="Carroll, Lewis">Lewis Carroll</name></author> <editor role="illustrator"><name reg="Tenniel, John">John Tenniel</name></editor> </titleStmt> <editionStmt> <edition n="30"> Edition 30 </edition> </editionStmt> <publicationStmt> <publisher>Project Gutenberg</publisher> <date value="1991-01">January, 1991</date> <idno type="etext-no">11</idno> <availability> <p>This eBook is for the use of anyone anywhere at no cost and with almost no restrictions whatsoever. You may copy it, give it away or re-use it under the terms of the Project Gutenberg License online at www.gutenberg.org/license</p> </availability> </publicationStmt> <seriesStmt> <title level="s">#1 in our series by Lewis Carroll</title> <idno type="vol">1</idno> </seriesStmt> <sourceDesc> <bibl> Unknown </bibl> </sourceDesc> </fileDesc> <encodingDesc> <classDecl> <taxonomy id="lc"> <bibl> <title>Library of Congress Classification</title> </bibl> </taxonomy> </classDecl> </encodingDesc> <profileDesc> <langUsage> <language id="en"></language> <language id="fr"></language> </langUsage> <textClass> <classCode scheme="lc">PR</classCode> <keywords> <list> <item>Alice</item> </list> </keywords> </textClass> </profileDesc> <revisionDesc> <change> <date value="1991-01">January 1991</date> <respStmt> <name>Anonymous</name> </respStmt> <item>Project Gutenberg edition 10</item> </change> <change> <date value="1994-03">March 1994</date> <respStmt> <name>Anonymous</name> </respStmt> <item>Project Gutenberg edition 30</item> </change> <change> <date value="2003-03">March 2003</date> <respStmt> <name>Marcello Perathoner</name> <!-- marcello@perathoner.de --> </respStmt> <item>TEI Markup</item> </change> </revisionDesc> </teiHeader> <text> <front> <divGen type="titlepage" /> <divGen type="pgheader" rend="newpage" /> <divGen type="toc" rend="newdoublepage" /> </front> <body rend="newdoublepage">
And this is the footer:
</body> <back rend="newdoublepage"> <divGen type="footnotes" /> <divGen type="colophon" rend="newpage" /> <divGen type="pgfooter" rend="newpage" /> </back> </text> </TEI.2>
See the TEI-Lite introduction.
Composite texts are not supported.
Unsupported
Unsupported.
See the TEI-Lite introduction.
On a block element the attribute rend may take one or more of the following values:
This block is left-adjusted.
Sed ut perspiciatis unde omnis iste natus error sit voluptatem accusantium doloremque laudantium, totam rem aperiam, eaque ipsa quae ab illo inventore veritatis et quasi architecto beatae vitae dicta sunt explicabo. Nemo enim ipsam voluptatem quia voluptas sit aspernatur aut odit aut fugit, sed quia consequuntur magni dolores eos qui ratione voluptatem sequi nesciunt. Neque porro quisquam est, qui dolorem ipsum quia dolor sit amet, consectetur, adipisci velit, sed quia non numquam eius modi tempora incidunt ut labore et dolore magnam aliquam quaerat voluptatem.
This block is centered.
Sed ut perspiciatis unde omnis iste natus error sit voluptatem accusantium doloremque laudantium, totam rem aperiam, eaque ipsa quae ab illo inventore veritatis et quasi architecto beatae vitae dicta sunt explicabo. Nemo enim ipsam voluptatem quia voluptas sit aspernatur aut odit aut fugit, sed quia consequuntur magni dolores eos qui ratione voluptatem sequi nesciunt. Neque porro quisquam est, qui dolorem ipsum quia dolor sit amet, consectetur, adipisci velit, sed quia non numquam eius modi tempora incidunt ut labore et dolore magnam aliquam quaerat voluptatem.
This block is right-adjusted.
Sed ut perspiciatis unde omnis iste natus error sit voluptatem accusantium doloremque laudantium, totam rem aperiam, eaque ipsa quae ab illo inventore veritatis et quasi architecto beatae vitae dicta sunt explicabo. Nemo enim ipsam voluptatem quia voluptas sit aspernatur aut odit aut fugit, sed quia consequuntur magni dolores eos qui ratione voluptatem sequi nesciunt. Neque porro quisquam est, qui dolorem ipsum quia dolor sit amet, consectetur, adipisci velit, sed quia non numquam eius modi tempora incidunt ut labore et dolore magnam aliquam quaerat voluptatem.
This block is left- and right-justified.
Sed ut perspiciatis unde omnis iste natus error sit voluptatem accusantium doloremque laudantium, totam rem aperiam, eaque ipsa quae ab illo inventore veritatis et quasi architecto beatae vitae dicta sunt explicabo. Nemo enim ipsam voluptatem quia voluptas sit aspernatur aut odit aut fugit, sed quia consequuntur magni dolores eos qui ratione voluptatem sequi nesciunt. Neque porro quisquam est, qui dolorem ipsum quia dolor sit amet, consectetur, adipisci velit, sed quia non numquam eius modi tempora incidunt ut labore et dolore magnam aliquam quaerat voluptatem.
This entity is rendered as a block and has wider margins.
Sed ut perspiciatis unde omnis iste natus error sit voluptatem accusantium doloremque laudantium, totam rem aperiam, eaque ipsa quae ab illo inventore veritatis et quasi architecto beatae vitae dicta sunt explicabo. Nemo enim ipsam voluptatem quia voluptas sit aspernatur aut odit aut fugit, sed quia consequuntur magni dolores eos qui ratione voluptatem sequi nesciunt. Neque porro quisquam est, qui dolorem ipsum quia dolor sit amet, consectetur, adipisci velit, sed quia non numquam eius modi tempora incidunt ut labore et dolore magnam aliquam quaerat voluptatem.
Use for examples of code. This block is rendered in a monospaced font. Line breaks are preserved.
This block gets indented by n em-spaces. n may be negative.
Sed ut perspiciatis unde omnis iste natus error sit voluptatem accusantium doloremque laudantium, totam rem aperiam, eaque ipsa quae ab illo inventore veritatis et quasi architecto beatae vitae dicta sunt explicabo. Nemo enim ipsam voluptatem quia voluptas sit aspernatur aut odit aut fugit, sed quia consequuntur magni dolores eos qui ratione voluptatem sequi nesciunt. Neque porro quisquam est, qui dolorem ipsum quia dolor sit amet, consectetur, adipisci velit, sed quia non numquam eius modi tempora incidunt ut labore et dolore magnam aliquam quaerat voluptatem.
This element starts a new page.
<div rend="newpage" id="chapter2"> <head>Chapter 2</head> <p> ... </p> </div>
This element starts a new right-hand page.
<div rend="newdoublepage" id="part2"> <head>Part 2</head> <p> ... </p> </div>
Floats the division to the left or right margin (HTML mode) or to the top or bottom of the page or to a special page (PDF mode). Valid value is a string composed of one or more of the option letters:
The division is floated to the left margin. HTML mode only.
The division is floated to the right margin. HTML mode only.
The floated division may stay here if there is enough room left on this page. PDF mode only.
The division may float to the top of the current page, if there is enough room for both, it and the previous text. If this is not the case, it is added at the top of the next page. The subsequent text continueson the current page. PDF mode only.
The division may float to the bottom of the current page. The subsequent text continues until the room left on the current page is just enough for the float. If there is already insufficient room, the float will be put at the bottom of the next page. PDF mode only.
The division may float to a special page containig only floats. PDF mode only.
The picture in the next example will float to the left margin in HTML mode. In PDF mode it will appear at this point in the text if there is enough room left on the page, else it will float to the top of the next page.
<p rend="float(lht) center"> <figure url="alice-01.png"> <figDesc>White Rabbit checking watch.</figDesc> </figure> </p>
You may also use one of the following shortcuts:
Shortcut for text-align(left).
Shortcut for text-align(center).
Shortcut for text-align(right).
Shortcut for text-align(justify).
Shortcut for indent(2).
Attribute rend takes the following values in addition to those listed under “All Block Elements”:
This paragraph will not have its first line indented.
If you try to mark up an embedded letter (piece of correspondence) you'll be surprised to find that the simple approach doesn't validate. Use this approach instead:
<div> <head>Chapter 1. The Spread of Evolution</head> <p>His book on animals and plants ... <text> <body> <head>C. Darwin to T.H.Huxley</head> <p> ... </p> </body> </text> ... </p> </div>
His book on animals and plants ...
...
...
Use <head> for the main header and <head type="sub"> for any subtitle. It is up to you to decide which title is main and which are sub.
<div> <head>Part I</head> <head type="sub">Being a Reprint from the Reminiscences of John H. Watson, M.D. late of the Army Medical Department</head> <div> <head>Mr. Sherlock Holmes</head> <p>In the year 1878 I took my degree of Doctor ... </p> <p>The campaign brought honours and promotion to many, ... </p> ... </div> <div> <head>The Science of Deduction</head> <p> We met next day as he had arranged, and inspected the rooms at No. 221b, Baker Street, of which he had spoken at our meeting. ... </p> ... </div> ... </div>
Attribute part; only the values, I, M and F are supported.
Example:
<lg type="limerick"> <l>There was a young lady of Riga,</l> <l>Who smiled as she rode on a tiger;</l> <l rend="indent">They came back from the ride</l> <l rend="indent">With the lady inside,</l> <l>And the smile on the face of the tiger.</l> </lg>
Will be rendered as:
There was a young lady of Riga,
Who smiled as she rode on a tiger;
They came back from the ride
With the lady inside,
And the smile on the face of the tiger.
<sp><speaker>Margarete</speaker> <l part="I">Versprich mir, Heinrich!</l> </sp> <sp><speaker>Faust</speaker> <l part="F">Was ich kann!</l> </sp> <sp><speaker>Margarete</speaker> <l>Nun sag, wie hast du's mit der Religion?</l> <l>Du bist ein herzlich guter Mann,</l> <l>Allein ich glaub, du hältst nicht viel davon.</l> </sp>
Will be rendered as:
Versprich mir, Heinrich!
Was ich kann!
Nun sag, wie hast du's mit der Religion?
Du bist ein herzlich guter Mann,
Allein ich glaub, du hältst nicht viel davon.
See the TEI-Lite introduction.
This tag has a different semantic than in TEI: without an ed attribute it produces a line break in the output at this point. (It should just record a line break in a certain edition.) There ain't a tag in TEI for a forced line break that is not a poetry line, so I collared this one.
You can generate thought-breaks if you set the unit attribute to tb. The default is to generate a small vertical gap.
These are the supported values for the rend attribute:
generates a thought-break consisting of n stars (asterisks).
generates a horizontal rule that is n % the width of the text.
See the TEI-Lite introduction.
This are the supported values for the rend attribute when applied to inline elements such as <hi>:
for text in italics
for bold text
for underlined text
for text in Small Capitals
for superscript text
for subscript text
for expanded text
for strikeout text
for smaller text
for small text
for large text
for larger text
for tty-type text
where x is a font family name: Times New Roman, Courier or Zapf Chancery. Note that display depends also on the fonts actually available on the user's machine.
where x is a percentage value: 50% 75% 100% 150% 200%
where x is a value between 100 and 900: 400 700 900. Note that display depends also on the fonts actually available on the user's machine.
<p>I have left everything <foreign lang="la" rend="italic">in statu quo</foreign> until I hear from you.</p>
The default rendering for <hi> without any rend attribute is italic.
Attribute rend takes the following values:
This quote is rendered as a displayed paragraph.
This quote has the opening mark only.
This quote has only the closing mark.
This quote has no quotation marks.
<p> <q rend="pre">The first thing that put us out was that advertisement. Spaulding, he came down into the office just this day eight weeks, with this very paper in his hand, and he says:</q> </p> <p> <q><q>I wish to the Lord, Mr. Wilson, that I was a red-headed man.</q></q> </p>
Will be rendered as:
“The first thing that put us out was that advertisement. Spaulding, he came down into the office just this day eight weeks, with this very paper in his hand, and he says:
“ ‘I wish to the Lord, Mr. Wilson, that I was a red-headed man.’ ”
See the TEI-Lite introduction.
The place attribute supports only the values of foot, end and margin.
The n attribute is not supported.
The note text should always be enclosed in paragrafs.
<note place="foot"> inserts a footnote marker at the exact point in the text. There should be no space between the commented text and the opening <note>, any space should be moved after the closing </note>.
<p> When I was a boy, there was but one permanent ambition among my comrades in our village<note place="foot"> <p> Hannibal, Missouri. </p> </note> on the west bank of the Mississippi River. That was, to be a steamboatman. ... </p>
Will be rendered as:
When I was a boy, there was but one permanent ambition among my comrades in our village2 on the west bank of the Mississippi River. That was, to be a steamboatman. ...
The handling of footnotes depends on the output format: if the format has facilities for pagination (PDF) the footnote appears at the bottom of the current page. If the format has no such facilities (HTML, TXT, PDB) the footnote appears at the end of the text. In HTML the footnote marker will be linked to the footnote text.
The endnote is less intrusive than the footnote and you should use it for any notes you add to the text yourself.
<note place="end"> does the same as <note place="foot"> for the HTML, TXT and PDB formats.
In the PDF format the endnotes get listed in the back matter with the page number. Because the user can only see the page number and not the exact position the note is attached, you should insert a short ‘reminder’ in the note text.
<p>Today about three o'clock the proofs of this paper arrived from the printers. The exercise consists of half a chapter of Thucydides<note place="end" resp="mp"> <p> <term>Thucydides</term>: Greek historian, remembered for his <title>History of the Peloponnesian War.</title> </p> </note>. I had to read it over carefully, as the text must be absolutely correct. ... </p>
Will be rendered as:
Today about three o'clock the proofs of this paper arrived from the printers. The exercise consists of half a chapter of Thucydides3. I had to read it over carefully, as the text must be absolutely correct. ...
See the TEI-Lite introduction.
Note: links work only in the HTML and PDF formats.
Use these for internal links.
Attribute target supports only one destination.
<p>See: <ref target="chapter42">Chapter 42</ref>.</p> ... <anchor id="chapter42" /> <div> <head type="sub">Chapter 42</head> <head>The Answer</head> <p> Wouldn't you like to know? ... </p> ... </div>
See the TEI-Lite introduction.
Attribute rend value run-on is not supported.
Always put one or more paragraphs (<p>) into an item.
See the TEI-Lite introduction.
Attribute rend takes following values:
Use this to give the table rules around every cell.
PDF output only. Use to give TEX hints about the table columns. The table is implemented using the LaTEX longtable environment.
<table rend="latexcolumns(|l|r|p{5cm}|)">
TXT output only. Use to give nroff hints about the table columns. The table is implemented using the tbl preprocessor.
<table rend="tblcolumns(l l lw50)">
Attribute role not supported.
See the TEI-Lite introduction.
Only PNG and JPEG formats are supported at present.
Attribute entity not supported. Use url instead.
New attribute url: the url of the image file.
Attribute rend texwidth is the width the figure is scaled to in PDF (through TEX) output. 100% represents the current linewidth.
<p rend="float(htb) center"> <figure url="alice-01.png" rend="texwidth(50%)"> <figDesc>White Rabbit checking watch.</figDesc> </figure> </p>
See the TEI-Lite introduction.
Use for examples. This is a block element, with line breaks preserved. In HTML it is also rendered as a shaded box.
<eg> Example Example Example </eg>
Will be rendered:
Example Example Example
Attribute notation can take following values:
In PDF output mode this will pipe the contents of the <formula> element directly through to the TEX processor.
In HTML output mode the contents of the <formula> element will be passed to an instance of TEX and converted to an image. The resulting image file is inserted into the HTML file.
In all other output modes it will be ignored.
In HTML output mode the contents of the <formula> element will be inserted literally into the HTML file.
In all other output modes it will be ignored.
In HTML and PDF output modes the SVG contents of the <formula> element will be converted to an image and inserted into the file.
In all other output modes it will be ignored.
Example:
<p>This is an inlined formula: <formula notation="tex">$\int_0^\infty f(x)\,dx$</formula>. And this is some more text after the inlined formula. </p>
Will display as:
This is an inlined formula: . And this is some more text after the inlined formula.
Example:
<p rend="center"> <formula notation="tex"><![CDATA[ \begin{align*} \left[ \frac{n}{p} \right] + \left[ \frac{n}{p^2} \right] &+ \left[ \frac{n}{p^3} \right] + \ldots \\ &= \sum_{i=0}^h \frac{a_i(p^{h-i} - 1)}{p - 1} \\ &= \frac{a_0p^h + a_1p^{h-1} + \ldots + a_h - (a_0 + a_1 + \ldots + a_h)}{p-1} \\ &= \frac{n - (a_0 + a_1 + \ldots + a_h)}{p - 1}. \end{align*}]]> </formula> </p>
Note the use of a CDATA section to avoid having to replace all &s with &. A CDATA section starts with <![CDATA[ and ends with ]]>.
Will display as:
An embedded SVG image.
<p rend="center"> <formula notation="svg"><![CDATA[ <?xml version="1.0" standalone="no"?> <!DOCTYPE svg PUBLIC "-//W3C//DTD SVG 1.1//EN" "http://www.w3.org/Graphics/SVG/1.1/DTD/svg11.dtd"> <svg width="12cm" height="4cm" viewBox="0 0 1200 400" xmlns="http://www.w3.org/2000/svg" version="1.1"> <polygon fill="red" stroke="yellow" stroke-width="10" points="350,75 379,161 469,161 397,215 423,301 350,250 277,301 303,215 231,161 321,161" /> </svg> ]]> </formula> </p>
Will display as:
Attribute n sets the title of the generated section. If missing a default title is used. See below.
Attribute type supports following values:
Generates a standard title page from the <teiHeader> element.
Generates a colophon from the <teiHeader> element. This includes all notes in the header and all revisions to the text. Default title is “Credits”.
Generates a table of contents from <index index="toc"> elements. Default title is “Contents”.
Generates a standard PG header appropriate for the output format.
Generates a standard PG footer appropriate for the output format.
Generates a footnotes section. This section is automatically populated with the contents of the <note type="foot"> tags found in the text. Default title is “Notes”.
Here is an example for the front matter:
<front> <divGen type="titlepage" /> <divGen type="pgheader" rend="newpage" /> <divGen type="toc" rend="newdoublepage" n="Inhaltsverzeichnis" /> </front>
And this is an example for the back matter:
<back rend="newdoublepage"> <divGen type="footnotes" /> <divGen type="colophon" rend="newpage" /> <divGen type="pgfooter" rend="newpage" /> </back>
Attributes level2 through level4 not supported.
Attribute index supports following values:
The table of contents.
The bookmarks section of a PDF file. Note that no special characters (like —) should be used for a PDF bookmark.
The bookmarks section of a PDB file. Note that PDB can accomodate a maximum of 15 characters per bookmark. Strings exceeding this length will be truncated.
Element <index> attribute level1 will default to the contents of the next <head> element.
See the TEI-Lite introduction.
You should use a unicode-capable editor to edit your files and save them in utf-8 encoding. If you cannot do that, you'll have to choose a different encoding and enter all characters your encoding cannot handle with XML entities. To do that, you'll have to find out the unicode code point of the character first.
Ways to use XML entities | ||
---|---|---|
Example | yields | |
Decimal | ¢ | ¢ |
Hexadecimal | ¢ | ¢ |
Named | ¢ | ¢ |
You must replace these characters if they occur in your original text. This is because TEI recognizes them as special characters, eg. < and > are the start and the end of a markup tag, & is the start of an entity.
XML special characters | |
---|---|
replace | with |
& | & |
< | < |
> | > |
If you use the ISO-8859-1 encoding to save your TEI file, you will not be able to enter these characters directly. You can still get them if you write:
Useful characters not in ISO-8859-1 | ||
---|---|---|
to get | type | comment |
Œ | Œ | The OE ligature used in French texts. |
œ | œ | The oe ligature used in French texts. |
Š | Š | An S with an inverted hat. |
š | š | An s with an inverted hat. |
Ÿ | Ÿ | An Y with a diaeresis. |
ƒ | ƒ | matematical function of |
>< | ‍ | Zero-width joiner. Use: shelf‍ful to get rid of the ff ligature. Sometimes TEX may need a little hint. See shelfful vs. shelfful. Use also to make sure a word does not get broken here. |
> < | | A non-breaking space. Use: Mr. Sherlock Holmes |
> < |   | A space the size of an n. |
> < |   | A space the size of an m. A typografical quad. |
> < |   | A thin space. May be appropriate between quotes. Note: the program inserts a thin space between quotes automatically if you mark up quotes using the <q> tag. |
– | – | Use between numbers. Eg: 1914–18 The great war of 1914–18. |
— | — | Use instead of -- |
—— | &qdash; | Use instead of ----. A quote dash or two-em dash is used to indicate missing letters: Mr. P—— woke up. |
‘ | ‘ | |
’ | ’ | |
‚ | ‚ | |
“ | “ | |
” | ” | |
„ | „ | |
† | † | |
‡ | ‡ | |
• | • | bullet |
… | … | Horizontal ellipsis. Use instead of three dots. Note how the ellipsis dots are spaced farther apart than if you enter three dots: ... |
‰ | ‰ | |
′ | ′ | Use for coordinates. |
″ | ″ | Use for coordinates. |
‹ | ‹ | French guillemet |
› | › | French guillemet |
€ | € | |
™ | ™ | |
♠ | ♠ | The card suites. |
♡ | ♥ | |
♢ | ♦ | |
♣ | ♣ |
If you are using a UNICODE-capable editor, you can just enter the characters directly.
See the TEI-Lite introduction.
An epigraph contains a quotation, anonymous or attributed, appearing at the start of a section or chapter, or on a title page. An epigraph is rendered in smaller type and right adjusted.
Monte Video — Maldonado — Excursion to R Polanco — Lazo and Bolas — Partridges — Absence of Trees — Deer — Capybara, or River Hog — Tucutuco — Molothrus, cuckoo-like habits — Tyrant Flycatcher — Mocking-bird — Carrion Hawks — Tubes formed by Lightning — House struck
A formal list or prose description of the topics addressed by a subdivision of a text.
See the TEI-Lite introduction.
Experimental
Partially supported through the <divGen type="colophon"> element.
You should not try to build a conformant header by yourself (unless you are smarter than I am) but just copy the provided header template and modify the appropriate entries.
Used to insert conditional text.
Attribute has takes the following values:
Test if the text requires a footnote section.
Only paginated output formats like PDF can place the footnotes at the foot of the page. Other formats like HTML don't know pages at all, so we have to place the footnotes at the end of the whole text. (PDF too can have endnotes — notes that appear at the end of the book instead of at the foot of the page.)
This example creates a <back> only if there are footnotes. In a text with footnotes it will create a <back> in the HTML output format but not in the PDF output format.
<pgIf has="footnotes"> <then> <back rend="newdoublepage"> <divGen type="footnotes" /> </back> </then> </pgIf>
Attribute output takes the following values:
Test if the output format is HTML.
Test if the output format is TEX. TEX is presently used for PDF generation.
Test if the output format is NROFF. NROFF is presently used for TXT and PDB generation.
If you use this feature your text will need revision to accomodate any change in the TEI processing system. For instance, it is not guaranteed that PDF output will always be generated by TEX nor that TXT will always go through NROFF.
<div rend="display,right"> <pgIf output="tex"> <then> <p> <formula notation="tex"> \reflectbox{Jabberwocky}\medbreak \reflectbox{'Twas brillig, and the slithy toves}\break \reflectbox{\quad Did gyre and gimble in the wabe;}\break \reflectbox{All mimsy were the borogoves,}\break \reflectbox{\quad And the mome raths outgrabe.}\par </formula> </p> </then> <else> <lg> <l rend="right"> ykcowrebbaJ </l> </lg> <lg> <l rend="right"> sevot yhtils eht dna ,gillirb sawT' </l> <l rend="right"> ebaw eht ni elbmig dna eryg diD  </l> <l rend="right"> ,sevogorob eht erew ysmim llA </l> <l rend="right"> .ebargtuo shtar emom eht dnA  </l> </lg> </else> </pgIf> </div>
Will be rendered as (if you are viewing the PDF file you will see true mirrored text4):
ykcowrebbaJ
sevot yhtils eht dna ,gillirb sawT'
ebaw eht ni elbmig dna eryg diD
,sevogorob eht erew ysmim llA
.ebargtuo shtar emom eht dnA
The Gnutenberg Press is the software to convert from TEI to HTML, TEX, TXT and PDB. It is a collection of XSLT stylesheets driven by a Perl script.
This is a diagram showing how the conversion is done.
The Gnutenberg Press
The XSLT stylesheets do the bulk of the work. The Perl script calls XSLT at the right moments and fixes up things that are just too difficult to get right with XSLT, like the correct placement of newlines, which is crucial to TEX and nroff.
nroff is called twice with slight differing parameters: with the latin1 device and line breaking on for TXT, with a custom PDB device and line breaking off for PDB. The PDB device is customized towards the special Palm-OS character set.
The Gnutenberg Press is released under the GNU General Public License (GPL).
You may download the Gnutenberg Press.
To use the Gnutenberg Press you need these tools:
Get libxml2 and libxslt from the XML C parser and toolkit of Gnome.
The Pathologically Eclectic Rubbish Lister by Larry Wall in a version >= 5.8.0. Get Perl from the Comprehensive Perl Archive Network. Install the XML::LibXML and XML::LibXSLT modules from CPAN.
The typesetting system invented by Donald Knuth. Get TEX from the TEX Users Group Home Page.
Get GNU groff.
You need a patched version of txt2pdbdoc. The patchfile is contained in the Gnutenberg Press archive.
If you are running a fairly recent Linux distribution you should already have got most of them. If you are on Windows you'll have to sweat some to get them all, but, if you run Windows, you like to suffer, right?
If you have non-iso-8859-1 characters in the headings, the pdf conversion will choke. You'll have to use the <index index="pdf"> tag to provide an alternate heading without those special characters.
In this example the pdf converter would choke on the mdash character in the heading. Thus you have to provide an alternate heading for the pdf bookmark section.
<div> <index index="pdf" level1="Chapter 1 -- First Day" /> <head>Chapter 1 — First Day</head>
This note ref should not display in the toc.
Hannibal, Missouri.
Thucydides: Greek historian, remembered for his History of the Peloponnesian War.
Technical information: You may wonder why we don't use the convert formula to image feature here to generate the reflected text in HTML. Actually \reflectbox is a command of the pdflatex driver. To convert formulas into images we use the dvips driver because of its higher output quality.