This is a hacked up version of Aaron Swartz' html2text to output reST.
- converts basic HTML documents to reST (titles, paragraphs, emphasizing)
- lists <ul> <ol>, images, <code> and <pre> blocks
- build a list of links, add reST references
- the output is word-wrapped
- tables are not fully supported (cell contents is output, but no borders)
- it happens that it word-wraps URLs or does not catch the whole address
in a reference
- it doesn't like non-ASCII characters, currently replaces them with a ?
- single backticks are put around all references even if it's one word.
- not configurable (wrap optional, references per chapter , etc.)
- totaly reinvent the thing and write a xml2rst writer (rst2rst ;-) and use
html2rst.py index.html >index.txt
The html2text script the code is based on, is licensed under the GNU GPL 2.
This code is under the same license.