Character Entity Sets

Author: David Goodger
Contact: goodger@users.sourceforge.net
Revision: $Revision$
Date: $Date: 2003/06/30$
Copyright: This document has been placed in the public domain.

The files in this directory contain reStructuredText substitution definitions for character entity sets, from the ISO 8879 & ISO 9573-13 (combined), MathML, and HTML4 standards. They were generated by the tools/unicode2rstsubs.py program from the input file unicode.xml, which is maintained as part of the MathML 2 Recommentation XML source, available at <http://www.w3.org/Math/characters/unicode.xml>.

Entity Set File Description
html4-lat1.txt HTML Latin 1
html4-special.txt HTML Special Characters
html4-symbol.txt HTML Mathematical, Greek and Symbolic Characters
isoamsa.txt Added Mathematical Symbols: Arrows
isoamsb.txt Added Mathematical Symbols: Binary Operators
isoamsc.txt Added Mathematical Symbols: Delimiters
isoamsn.txt Added Mathematical Symbols: Negated Relations
isoamso.txt Added Mathematical Symbols: Ordinary
isoamsr.txt Added Mathematical Symbols: Relations
isobox.txt Box and Line Drawing
isocyr1.txt Russian Cyrillic
isocyr2.txt Non-Russian Cyrillic
isodia.txt Diacritical Marks
isogrk1.txt Greek Letters
isogrk2.txt Monotoniko Greek
isogrk3.txt Greek Symbols
isogrk4.txt [1] Alternative Greek Symbols
isolat1.txt Added Latin 1
isolat2.txt Added Latin 2
isomfrk.txt [1] Mathematical Fraktur
isomopf.txt [1] Mathematical Openface (Double-struck)
isomscr.txt [1] Mathematical Script
isonum.txt Numeric and Special Graphic
isopub.txt Publishing
isotech.txt [1] General Technical
mmlalias.txt MathML aliases for entities from other sets
mmlextra.txt [1] Extra names added by MathML
[1](1, 2, 3, 4, 5, 6) There is a *-wide.txt variant for each of these character entity set files, containing characters outside of the Unicode basic multilingual plane or BMP (wide-Unicode; code points greater than U+FFFF). Most pre-built Python distributions are "narrow" and do not support wide-Unicode characters. Python can be built with wide-Unicode support though; consult the Python build instructions for details.

These character entity sets can be used in documents using the "include" directive and substitution references. For example:

.. include:: isonum.txt

Copyright |copy| 2003 by John Q. Public, all rights reserved.

Individual definitions can also be copied from these entity set files and pasted into documents. This has two advantages: it removes dependencies, and it saves processing of unused characters. However, if more than a few character entities are defined, they add clutter to the document.

Substitution references require separation from the surrounding text with whitespace or punctuation. To use a character without intervening whitespace, you can use the disappearing-whitespace escape sequence, backslash-space:

.. include:: isonum.txt

Copyright |copy| 2003, BogusMegaCorp\ |trade|.

The "unicode" directive can be used as well; whitespace is ignored and removed, effectively sqeezing together the text:

.. |copy|   unicode:: U+000A9 .. COPYRIGHT SIGN
.. |BogusMegaCorp (TM)| unicode:: BogusMegaCorp U+2122
   .. with trademark sign

Copyright |copy| 2003, |BogusMegaCorp (TM)|.