============================ Docutils_ Code Introduction ============================ :Author: John Mulder w/ text borrowed from throughout the docutils docstrings. :Contact: johnmulder@gmail.com :Revision: $Revision: 4977 $ :Date: $Date: 2007-03-01 20:47:56 +0100 (Do, 01. Mär 2007) $ :Copyright: This document has been placed in the public domain. :Abstract: This is the introduction to the Docutils source code :Prerequisites: You will need some basic Python_ knowledge, as well as some understanding of ReStructuredText_. .. _Docutils: http://docutils.sourceforge.net/ .. _Python: http://www.python.org .. _ReStructuredText: http://docutils.sourceforge.net/docs/user/rst/quickstart.html .. contents:: Obtaining the Docutils Code =========================== The latest snapshot of the docutils code is located at sourcforge as a tarball_. Alternatively, you can get direct access to the subversion server as described on the docutils site in the `repository instructions`_. .. _tarball: http://docutils.sourceforge.net/docutils-snapshot.tgz .. _repository instructions: http://docutils.sourceforge.net/docs/dev/repository.html Docutils Flow of Execution ========================== The flow of a document through a docutils utility starts with a `Publisher` object from `docutils/core.py`. The publisher is used as follows: 1. Instantiate the publisher, which in turn instantiates the following: a. The document tree (`docutils.nodes` objects). b. A `docutils.readers.Reader` instance. c. A `docutils.parsers.Parser` instance. d. A `docutils.writers.Writer` instance. 2. Set Components: If reader, parser, or writer objects are not passed to the publisher, check for names to have been passed in and use them instead. If neither are passed in, use defualts. 3. Process settings: ??? 4. Set Source: 5. Set Destination 6. Publish: A. set io:??? B. Call the read function of the Reader i. Scan input text from file, string, or pre-proccessed document tree. Uses a subclass of `Input` in: `docutils/io.py` ii. Parse text into document tree. The parser chosen depends on the document format of the input. Uses a parser in: `docutils/parsers/` iii. Return a document tree to the Publisher. The tree is made up of nodes from: `docutils/nodes.py` C. Call the apply transforms function of the Transformer in: `docutils/transforms/__init__.py` Apply transforms to the document tree as determined by the reader and writer. Uses transforms in: `docutils/transforms/` D. Call the write function of the Writer in: `docutils/writer` a. Takes document tree as input. b. Instantiates a subclass of `docutils.nodes.NodeVisitor` which traverses the doctree using the `Node.walkabout()` function in: `docutils/nodes/nodes.py` Organization of the Docutils Code ================================= Within the docutils directory, the package for docutils is in a subdirectory also called docutils. This contains both modules and subpackages. Modules in Docutils =================== __init__.py ----------- The __init__ module contains base classes and functions that are inherited in other modules throughout the docutils package. core.py ------- The core module contains the `Publisher` object. Calling the ``publish_*`` convenience functions (or instantiating a `Publisher` object) with component names will result in default behavior. For custom behavior (setting component options), create custom component objects first, and pass *them* to ``publish_*``/`Publisher`. See `The Docutils Publisher`_. .. _The Docutils Publisher: http://docutils.sf.net/docs/api/publisher.html frontend.py ----------- Command-line and common processing for Docutils front-end tools. Includes classes which parse options and functions for proccessing those options. io.py ----- I/O classes provide a uniform API for low-level input and output. Subclasses will exist for a variety of input/output mechanisms. nodes.py -------- Docutils document tree element class library. Classes in CamelCase are abstract base classes or auxiliary classes. The one exception is `Text`, for a text (PCDATA) node; uppercase is used to differentiate from element classes. Classes in lower_case_with_underscores are element classes, matching the XML element generic identifiers in the DTD_. The position of each node (the level at which it can occur) is significant and is represented by abstract base classes (`Root`, `Structural`, `Body`, `Inline`, etc.). Certain transformations will be easier because we can use ``isinstance(node, base_class)`` to determine the position of the node in the hierarchy. .. _DTD: http://docutils.sourceforge.net/docs/ref/docutils.dtd statemachine.py --------------- A finite state machine specialized for regular-expression-based text filters. This module is used by the reST parser, but is designed to be of general utility. urischemes.py ------------- `schemes` is a dictionary with lowercase URI addressing schemes as keys and descriptions as values. utils.py -------- Miscellaneous utilities for the documentation utilities. examples.py ----------- Contains practical examples of Docutils client code. Subpackages in Docutils ======================= languages --------- This package contains modules for language-dependent features of Docutils. parsers ------- This package contains Docutils parser modules. :null.py: A module containing a parser which does nothing. This is used when transforming from a pickled document tree to any form. :rst: A subpackage containing the parser for reStructuredText. The reStructuredText parser is implemented as a state machine, examining its input one line at a time. To understand how the parser works, please first become familiar with the `docutils.statemachine` module, then see the `states` module. readers ------- This package contains Docutils Reader modules. Each reader module or package must export a subclass also called 'Reader'. The three steps of a Reader's responsibility are defined: `scan()`, `parse()`, and `transform()`. Call `read()` to process a document. transforms ---------- This package contains modules for standard tree transforms available to Docutils components. Tree transforms serve a variety of purposes: - To tie up certain syntax-specific "loose ends" that remain after the initial parsing of the input plaintext. These transforms are used to supplement a limited syntax. - To automate the internal linking of the document tree (hyperlink references, footnote references, etc.). - To extract useful information from the document tree. These transforms may be used to construct (for example) indexes and tables of contents. Each transform is an optional step that a Docutils component may choose to perform on the parsed document. writers ------- This package contains Docutils Writer modules. Each writer module or package must export a subclass also called 'Writer'. Each writer must support all standard node types listed in `docutils.nodes.node_class_names`. The `write()` method is the main entry point. In the subpackages, each writer is implemented in the `__init__.py` files. Modules in Writer: :docutils_xml: Simple internal document tree Writer, writes Docutils XML. :null: A do-nothing Writer. :pseudoxml: Simple internal document tree Writer, writes indented pseudo-XML. Subpackages in Writer: :html4css1: Simple HyperText Markup Language document tree Writer. The output conforms to the XHTML version 1.0 Transitional DTD (*almost* strict). The output contains a minimum of formatting information. The cascading style sheet "html4css1.css" is required for proper viewing with a modern graphical browser. :latex2e: LaTeX2e document tree Writer. :newlatex2e: LaTeX2e document tree Writer. :pep_html: PEP HTML Writer. :s5_html: S5/HTML Slideshow Writer. .. |---| unicode:: 8212 .. em-dash :trim: .. Local Variables: mode: indented-text indent-tabs-mode: nil sentence-end-double-space: t fill-column: 78 End: