# From setext-list Mon Mar 30 00:06:05 1992 # From: Keepers of The Setext Flame[tm] # Date: Sun, 29 Mar 23:48:00 1992 +0100 (CET) # Organization: random design -- any old TTY will do # X-Bad-Address: no more mail to please # Subject: two types of setext (sermon #2) ermons etexts rmonse textse monset extser onsete xtserm nsetex tsermo setext sermon ================ 920329 #2 Two types of setext -------------------- The fact that the format has been optimized for requirements (and vagaries!) of online-transported text publications can easily lead one to believe that all setexts are by necessity confined to pure ASCII (7-bit) text. Actually, it is not so. The setext has been named for what it is primarily about, structural enhancement of any text, not just that of the ASCII text. Therefore none of the typotags chosen for encoding of the structure either relies on or cares about whether the text being wrapped is of the 7- or 8-bit variety. Both types of source documents can be made equally structured, with the only difference being their final suitability for the intended transport medium. If a setext is to be distributed via the 7-bit electronic mail then, of course, no other option remains than to make sure that it contains nothing but ASCII characters. On the other hand, if it's to become part of an otherwise encoded package (such as a binhexed archive in which documentation files have been setextized) or distributed solely through 8-bit-safe means then there may be no clear-cut reason not to use the full 8-bit character set where so called for. Other considerations --------------------- Once a decision has been reached to use 8-bit characters in a setext a possibility arises to keep the paragraph text unwrapped, rather than folded uniformly at the 66th character mark. After all, if the setext is primarily to be displayed inside an editor, rather than on an 80-character terminal screen, then there is not much sense in prior folding of the lines to a specific guaranteed- to-fit-on-a-TTY-screen length. The editor/ word processor program will fit the unwrapped text to window or the available display area, and might actually prefer to have to deal with whole unwrap- ped paragraphs rather than with otherwise relatively-short lines. Most text-processing programs with native word-wrap capabilities actually consider return-terminated lines to be paragraphs in their own right. Thus, if a setext is not to travel via email anyway (because of it being distributed differently or making use of accented characters) then it might as well arrive in unfolded state so that no extra time need be spent on making the paragraphs "whole again." Do observe, however, that it is not the state of the paragraph text that makes or breaks a setext. No, the sole criterion of whether a text is a setext is the presence of at least one verified subhead, as described in sermon #1. Thus even texts with unfolded paragraphs (i.e. terminated by carriage returns, equal to lines in HyperTalk that can be up to 30000 characters in length) are setexts if they contain at least one subhead-tt. The sole mechanism used in setext to encode which of such lines are in reality paragraphs (as opposed to those that shouldn't be folded mechanically) is the character indent. In fact, after the subhead-tt the second most important typotag is the indent-tt, made up of exactly two space characters, which denotes any such indented lines as ready-candidates for reflowing by so inclined front-ends (either on their own or as part of like-indented lines above and below it). So any potentially-long line of a setext that has been indent-tted will be understood (by any validated setext front-end) as to be ready for wrapping-to-length if so required. An example of unwrapped paragraph ---------------------------------- For instance, this paragraph has specifically been unfolded to demonstrate the validity of the concept: indent-tted, yet still-unwrapped line in a setext piece makes it into a paragraph of its own. Depending on the type of the terminal software at your end it will most probably be folded at some "mechanical 79-th character mark" and fill the available window's width. Imported into a word processor it will end up word-wrapped and thus become easy to add to or delete from, since the program will simply reflow the whole "multi-line" paragraph as needed. And yet this paragraph, made up of 5 sentences, 1037 characters in all, is in reality just one long line of text with a sole carriage return terminating character at end. Please observe that the initial (and _sole_) indent-tt in this paragraph makes a nice mark of where the subhead ends and the paragraph begins. Not even badly- written reflowing routines will fail to notice that a piece of text is indented, and therefore, most probably, not part of other text around it. Primary setext-type distinction -------------------------------- Still, if all the texts that fulfill the sole basic validated- subhead requirement are to be considered setexts then the need arises to distinguish between the "pure" variety of them, the _rigidly_formatted_ ones encoded for 7-bit transport duty (=no accented characters AND with ready-folded paragraphs), and all the other ones. Indeed, this important point has also been addressed. As originally explained, setext documents in online distribution should be denoted by the ".etx" suffix (which stands for both "emailable" and "enhanced text.") In reality this suffix should ONLY be used for the rigidly-formatted, "pure" setexts, as is the case with TidBITS that carry it at sumex and elsewhere. All the other setexts, either the not-fully-7-bit ones, or those with unfolded paragraphs (as the one above) **may** carry a more common ".txt" suffix, but not an ".etx" one. They are setexts too but as they definitely are _not_ guaranteed to fare well in electronic mail transport then their titles should not signal that special "setext.etx" status either (to readers and front-ends that are aware of the distinction). It is enough that their titles be indicative of them being simply "text_documents.txt." Therefore: fully 7-bit/ 66-char-folded setexts may carry the .etx suffix. All the other setexts: .txt ONLY suffix, please, as does this issue of the sermon (on account of the unwrapped paragraph).._. Change of topic: delete functions ---------------------------------- Akif Eyler, , who's adapting an existing document browser for setexts (a MacApp hack) writes on the subject of by me suggested point to allow selective deletion of parts of a browsed setext: > We are talking about a browser, not an editor. I don't think > text modification should be allowed here. This is an important point that deserves a little more comment than that. I believe that we may be talking about two different things. Although it is true that a "browser" is basically an application for structured paging of text there is nothing to prevent such applications from offering additonal services that may be appropriate or of great value to its users. So while technically we both may be right, in this respect, when speaking of "browsers", I mean "setext tools" rather than the more traditional, straigh-browse-function-only, implementations. You should keep in mind the basic difference between a traditional browser, designed for navigation in a potentially VERY LARGE data mass and that of a setext front-end, meant to be used with texts of limited size (<50K or so). Such short texts are inherently easier to read without assistance of any special tools, even if using one is to be recommended. But let us not believe that people will start using a browser only because one is available. After all, a "typical" setext might contain 20-odd "pages" of text, which are not that hard to browse using the standard ways and means of ANY application (scrollbars on the Mac etc). For that reason alone I feel that setext browsers should offer a few other facilities in order to make using them worthwhile to the users... with the prime among such values-added functions being a method to delete portions of read setexts, in as simple a manner as possible. Please bear in mind that setexts are not guaranteed to survive editing at the receiving end anyway and that the format as such is intended mainly for periodic online publications. And what do we do with interesting articles in print magazines? We clip them out and discard the rest. Thus an easy delete function wouldn't entirely be out of place in a browser used for reading of periodic, time-topical online publications etc (no doubt it would be an unwanted feature when browsing of [large] _reference_ works etc). So what constitutes such an "Easy-Delete"? In my view a browser should allow _unobtrusive_ deletion of text in either or both of the following ways: (a) a whole current topic may be flagged for deletion. That does not take place until browsing of the setext is terminated, however (by reading in of another setext or closing of the application), at which time the originally-read-in setext gets written back to disk less the flagged portions. It goes without saying that such flagging actions should be undoable before the rewritting takes place. The latter should happen automatically, without prior and explicit replace-confirmation? dialogs. (b) selected text-chunk onscreen may simply be deleted by pressing the delete or clear keys. It should disappear from the display at once but _could_ be removed from file first upon termination of the browsing-of-current-setext operation (as per above). This function does not have to be undoable as the text may be preserved by reading-in of it once again. I.e. the browser keeps track of opened file(s) and if one such is opened again then it simply forgets about any queued selective deletes in current text and replaces the contents of the buffer with a clean copy from the disk. As above, no explicit confirmation should be required first. If you delete something then you delete something, and there should be no need to explain it once again to the machine that you did it on purpose. On the other hand care should be taken to prevent inadvertent destruction of browsed setexts, perhaps by requiring that selection of text be made with the option (or command) key pressed down, or that Command-Delete be required to flag a whole topic for removal. On top of that the really-important documents, those not intended to be rewritten during browsing, should be kept locked on disk anyway. ------------------------------------------> end of setext sermon #2 edited <----- Ian Feldman inquiries --> setext-list@random.se ------------> setext, the structure-enhanced text concepts document (last changed Aug 1992; do not reorder if you already have seen it) may be requested by sending "setext" alone on the Subject: line, no quotes, empty message body to -----------> ..