head 1.5; access; symbols RSE:1.1.1.1 vendor:1.1.1; locks; strict; comment @# @; 1.5 date 2002.03.13.09.19.03; author rse; state dead; branches; next 1.4; 1.4 date 2000.11.09.10.07.46; author rse; state Exp; branches; next 1.3; 1.3 date 2000.11.08.22.04.04; author rse; state Exp; branches; next 1.2; 1.2 date 2000.09.26.07.48.43; author rse; state Exp; branches; next 1.1; 1.1 date 2000.09.14.15.51.52; author rse; state Exp; branches 1.1.1.1; next ; 1.1.1.1 date 2000.09.14.15.51.52; author rse; state Exp; branches; next ; desc @@ 1.5 log @add our recent evaluation stuff @ text @ Sugar -- The Markup Language With Invisible Syntactic Sugar =========================================================== Ralf S. Engelschall ,&{br} Christian Reiber ++ | Genesis: | 12-Mar-1999 | | Last Update: | 08-Nov-2000 | Introduction ------------ Sugar is a markup language and corresponding translator tool for writing technical documentation that uses mostly invisible markup tags (the so-called //syntactic sugar// in compiler construction folk terminology). The general idea is that the markup text looks already like the textual output of the translator phase, that is, the Sugar source can be already treated as its text output format ("ASCII WYSIWYG"). Additionally the Sugar markup language is considered intuitive enough to be recognized easily, so writing technical documentation is mainly just a matter of performing a brain dump. So Sugars syntactic principle is "keep it simple'n'stupid" (KISS) but still powerful enough to allow one to produce high-quality output. Sugars goal is not to provide all features of a full featured documentation system. Instead it provides only a few markup concepts but those are streched to a maximum. Sugar Grammar ------------- A Sugar document is described by the following grammar: ++ | | | | | | ::= | * | | | ::= | \| | | | ::= | <1d-block> \| <2d-block> | | <1d-block> | ::= | <1d-tag> | | <2d-block> | ::= | <2d-tag> | | <1d-tag> | ::= | "!!##!!" \| "!!\|\!!|" \| "!!``!!" \| "!!''!!" \| ... | | <2d-tag> | ::= | "!!**!!" \| ... | where is defined visually as a rectangular block of continued text inside the document, that is a paragraph of text (without any blank lines) where each line starts at the same indentation position. Markup Language --------------- The Sugar markup language consists of markup tags grouped into a few classes: o ..Visual Formatting:.. For visual formatting of text the following <1d-tag> exists. They can be used either inlined in a paragraph by using them twice (to delimit begin and end) or for marking up a whole block by using them in marched-out way (to delimit indented block). ++ | tag | formatting | application | | !!__!! | underline | inline, block | | !!**!! | bold | inline, block | | !!//!! | italics | inline, block | | !!''!! | code | inline, block | | !!>>!! | indented | block | | !!==!! | paragraph header | block | | !!\!\!!! | verbatim | inline, block (charblock only until EOL) | | !![[!! | boxed | inline, block | | !!%%!! | centered | block | | !!))!! | right flushed | block | | !!((!! | left flushed | block | Example: !! This line contains //italics// and **bold** words. >> '' And this paragraph contains indented verbatim code. And this line again is __non-indented__ text. o ..Links and References:.. For referencing textual locations (both document internal and to external documents), links can be specified. ++ | construct | description | | !!->!![//text//]!!(!!//scheme//!!:!!//path//!!)!! | external hyperlink via URL | | !!->!!//text//!!(!!//ref//!!)!! | internal hyperlink via anchor-name | | !!(+!!//ref//!!+)!! | internal anchor definition | Example: !! Header A (+hA+) -------- This is text of header A. For B see ->header B(hB). For Sugar go to ->(http://www.ossp.org/pkg/sugar/). Header B (+hB) -------- This is text of header B. For A see ->header A(hA). For other neat things watch ->OSSP(http://www.ossp.org/). o ..Headers:.. Up to four levels of headlines can be marked up by placing the following character sequences (or any number of concatenated repetitions of them) at the end of a text block: ++ | sequence | level | | !!==!! | I. | | !!--!! | II. | | !!~~!! | III. | | !!..!! | IV. | Example: !! A header line ============= o ..List Environments:.. Three types of list environments can be used. They are identified by the first non-blank word in the first line of each list item. For ordered lists the start position is selectable by specifying an explicit digit instead of the generic item character. ++ | construct | alternatives | type | | !!-!! | !!o!!, !!*!! | unordered | | !!-.!! | !!o.!!, !!*.!!, m/[0-9]+\./ | ordered | | !!::!! | | itemized | Example: !! A Sugar list: 1. foo o. bar - baz - foobar o. quux o ..Table Environment:.. A generic table environment is provided for any type of data which has to be rendered in a tabular layout. A table is a <2d-block> starting with a !!++!! tag. The contents of the <2d-block> consists of a 2-dimensional table specified by cells. The table cells are seperated by ''|'' characters. Every row has to start with a ''|'' at the same horizontal position. The number of columns is indicated by the first table row. This first row can be either a complete (all cells are specified) and regular (the contents is used) row, or (in case the first //regular// row is not a complete one, that is, it has multi-column cell spans), the first row can be an empty row (all cells are specified for indication but are left blank). The ''|'' marks can be placed arbitrary in each row, but if multi-column spans exists, the surrounding ''|'' marks have to be placed exactly at the same horizontal character position as the first table row has (else the multi-column cells are ambiguous). Empty rows can be indicated by using just the starting ''|'' mark. Example: !! ++ | | | | | foo | !!bar | quux | | | foo | baz bar | quux | | bazfoo fjfjkwq rwrqwd sddksjk | dsdjks | foo | | dsdsds | ____ o ..Special Formatting:.. '' quotemeta ## command (charblock until EOL) `` shell command (charblock until EOL) o ..Escaping and Special Characters:.. -- em-dash (ger. "Gedankenstrich") \_ strong blank (prevents line break as in HTML's ) \n line break (as in HTML's
) \\ a backslash (there no block concept!) \X escapes following character X \{name} o Commands (all charblock until EOL): ##! ##include ##// line comment ##/* block comment anfang ##*/ block comment ende ##if ##elsif ##endif ##img ... ## [range] Sugar Output Formatting ----------------------- The Sugar transformation tool parses a Sugar source text, transforms it into an internal abstracted syntax tree and finally applies to it a particular output formatting module in order to transform the abstract syntax tree into target markup language. The target language then is either already an end-user document (HTML, Text, etc.) or intended for post-processing by external programs (LaTeX, PDF, etc.). The following outputs are supported: ++ | sugar output | post-processor(s) | final output | | Text | - | Text | | HTML | - | HTML | | Roff | nroff | Text | | Lout | lout | Postscript | | PDF | pdflib | PDF | | LaTeX | latex, dvips | DVI, Postscript, PDF | | XML | docbook | ... | | POD | pod2xxx | ... | @ 1.4 log @*** empty log message *** @ text @@ 1.3 log @*** empty log message *** @ text @d9 1 a9 1 | Last Update: | 26-Sep-2000 | d14 17 a30 8 Sugar is a markup language and corresponding translator tool for technical documentations that uses mostly invisible markup tags (the so-called //syntactic sugar// in compiler construction folk terminology). The general idea is that the markup text looks already like the textual output of the translator phase, that is the Sugar source can be already treated as its text format version. Additionally the Sugar markup language is considered intuitive enough to be recognized easily, so writing technical documentation is mainly just a matter of performing a brain dump. d46 4 a49 3 where is defined visually as a rectangular block of continued text inside the document, that is a paragraph of text (without any blank lines) where each line starts at the same indentation position. d59 4 a62 4 For visual formatting of text the following <1d-tag> exists. They can be used either inlined in a paragraph by using them twice (to delimit begin and end) or for marking up a whole block by using them in marched-out way (to delimit indented block). d86 1 a86 1 For %%referencing%% textual locations (both document internal and to d89 1 a89 1 ++ | construct d113 3 a115 3 Up to four levels of headlines can be marked up by placing the following character sequences (or any number of concatenated repetitions of them) at the end of a text block: d130 4 a133 4 Three types of list environments can be used. They are identified by the first non-blank word in the first line of each list item. For ordered lists the start position is selectable by specifying an explicit digit instead of the generic item character. d151 19 a169 9 A generic table environment can be used for any type of data which has to be rendered in a tabular layout. A table is a <2d-block> starting with a !!++!! tag. The contents of the <2d-block> consists of a 2-dimensional table specified by cells. The table cells are indicated by ''|'' characters. The number of columns is indicated by the first table row. This first row can be either a complete (all cells are specified) and regular row, or (in case the first regular row is not a complete one, that is has multi-column cells) it can be an empty row (all cells are specified for indication but are left blank). d176 1 a176 1 | d185 2 d188 1 a188 1 o Special Formatting: d194 1 a194 1 o Escaping and Special Characters: d216 22 a237 1 _______________________________________________________________________________ a238 192 1. Scanner erkennt die Intentation, strippt sie weg, berechnet aber durch sie die "schliessenden Klammern" zu den 2d-tags. 2. Scanner erkennt auch die Unterschiede zwischen 1d und 2d tags, da der Parser ja keinerlei Unterscheidung treffen kann (spaces/indent nicht mehr da) 3. Scanner hat einen Look-Ahead von 1 Zeile plus ihrem Indent 4. Das Parsen von Headern "(====)" geht einfach: Der Scanner erkennt nur das "^========" und ein Baumtransformator haengt spaeter die Sohn-Sequenz " x ..... y

" um in "

x ... y", d.h. der transformator geht bis zum letzten Paraphraph Knoten zurueck. If the "text" on hyperlinks is missing in links, the reference is printed instead. For internal links the text is chapter and pagenumber (except for HTML, there exists real hyperlinks). Stichworte: Whatever | Irgendwas ---------- | ----------------------------- Brain Dump | VHIT (Vom Hirn ins Terminal) Blabla | ASCII WYSIWYG Design-Grundsaetze ------------------ 1. KISS bei der Sprache (Beschreibung geht auf eine Seite und ist ISO-Latin-1!) 2. KISS bei der Implementierung (Code-Groesse <= 80KB) 3. Wir implementieren nur das, was wir _WIRKLICH_ brauchen. 4. Sugar ist wie Unix: Wenige Konzepte existieren und werden konsequent durchgezogen 5. Sugar hat *keine* GUI, sondern ist ein Filter! Beispielaufruf: $ cat test.txt | sugar --html -otest.html 6. Sugar ist stand-alone (bis auf Postscript), man braucht also nicht 1001 Tools bei der Installation 7. Release early, release often (Eric S. Raymond) 8. Jedes Markup kann immer eindeutig formuliert werden (=non-magic), nur sieht es dann eventuell nicht so schoen aus. Wenn man sich an bestimmte Regeln haelt, kann man im Magic Mode ASCII-Aesthetik pur nutzen. Non-Magic ist immer nutzbar und aktiviert, Magic-Mode per default an, aber kann abgeschalten werden (per -xx und/oder inline tag) Idee: -xx im Dokument direkt eingeben ala vi/less Was Sugar nicht ist ------------------- 1. Sugar ist _keine_ Textverarbeitung oder ein DTP-Tool 2. Sugar ist keine Markup-Sprache (der Text ist bereits das Endprodukt) 3. Sugars Brother is more/less and not nroff (i.e. Sugar is fast!) Anwendungsfeld -------------- 1. Technische Dokumentation fuer mehrere Darstellungsplatformen: Plain ASCII (= Sugar Quelle), roff/-man (Unix), HTML (= Online), PS (= Print) 2. Brain Dump! Optionale Zusatzfeatures ------------------------ - ToC: Automatische Generierung - Numerierung von Headern - Index - Aufrufen von Makroprozessor: m4 Tabellen: --------- o Tabellen sind Bloecke und werden mit ++ eingeleitet wie andere Bloecke auch, d.h. Ende ist bei Ausrueckung oder selber Level. o Jede Tabellenzeile faengt mit einem | an und immer in der selben Spalte. o Die |'s der ersten Zeile geben die Gesamtanzahl und die Normposition der Spalten an. o Besteht die erste Zeile nur aus |'s (und keinem Inhalt), dann ist sie eine _reine_ Normungszeile und erzeugt auch keine Leerzeile. Ansonsten (Zeile 2, ...) kann man so selbstverstanelich eine Leerreihe erzeigen. o Spaltentrennungs-| koennen an belieber Stelle stehen, wenn genuegend da sind. o Folgespalten sind dadurch gekennzeichnet, dasz ihr | eingerueckt erscheint. o Multicolums liegen vor, wenn weniger |'s auftreten, als die Normungszeile vorgibt. Die Erkennung der Span's erfolgt dabei ueber die Position der |'s, d.h. sie muessen die |'s der Normunszeile matchen. Zusaetzlich kann die Normungszeile beliebig oft wiederhlt werden. Aber dabei darf sich nur die Position der |'s aendern, aber nicht die Anzahl (klar!). o Leerzeilen bestehen aus nur einem |' am Anfang und sonst nichts. o Normungszielen haben mind.(!) 2 |'s. o Leerzeilen erzeugen im Output soviele |'s wie die Normungszeile vorgibt. Fuer andere Layouting-Dinge muss man z.B. ``| \_'' schreiben. o In einer Tabelle koennen alle Zeichenformatierungen genutzt werden. o In der Normungszeile kann mit den Zeichenformatierung-Tags die Formatierung der Tabellenspalten angegeben werden! 3. Block-Konzept Es gibt zwei Blockkonzepte: - character block (eindimensional) und - line block (zweidimensional). Der //character block// wird durch das Tag eingeleitet und wieder beendet. Das Paragraph Ende beendet in jedem Fall den character block. Der //line block// beginnt mit dem Tag __ausger�ckt__, wobei davor keine Leerzeile stehen mu� (ein \n und ggf. \s davor reicht). Er enth�lt ganze Zeilen und zwar solange, wie Text in der Zeile mindestens zwei Leerzeichen weiter rechts beginnt als das einleitende Tag. Achtung: Tags stellen selbst __nicht__ den Zeilenanfang dar! Damit kann ich also line blocks schachteln. (Anders gesagt: Es geht nicht um den linken Rand der Textdatein, sondern um den linken Rand des �bergeordneten Line Blocks.) Automatischer reflow durch den Editor ist bei character blocks **kein** Problem, da das Tag keine positionsabh�ngige Bedeutung hat (daher wurde auch verworfen, da� ein Tag am Zeilenanfang, aber nicht ausger�ckt, am Zeilenende beendet wird). Das Start-Tag beim line block wird vom Editor nicht versetzt (wenn er was taugt). M�glicherweise kann f�r bestimmte Tags das Ende des char blocks auch das Zeilenende (nicht das Para. Ende) sein. Gedacht ist an Kommandos: Gehen sich nach http://laber.lall Das ist ## eine bl�de Zeile und ich will da� ##das## unterstrichen ist ''##das ist ungut## ##das ist intuitiver, bedeutet aber Kommandoende=Zeilenende Das w�re dann eine Eigenschaft des Tags, d.h. es verh�lt sich dann //immer// so (und nicht mal so und mal anders). Beispiele: 1. ''Dies ist ein Beispiel f�r einen Text, __in dem der zweite Halbsatz unterstrichen wird__, obwohl er sich �ber eine Zeilengrenze erstreckt. o. ''__In diesem Fall wird der line block unterstrichen. Das geht solange, bis der Text wieder ausger�ckt wird. Auch Leerzeilen stellen da kein Hindernis dar. Diese Zeile beendet den Line Block. o. ''Ein Sonderfall: __Dieser Text hat kein Ende-1d-Tag. Er wird dann durch das Paragraph-Ende beendet. Ab hier also keine Unterstreichung mehr. Das haben wir gemacht, weil sonst bei vergessenen Endetags das Restdokument fehlformatiert wird. o Native-Output-Stuff xxxx ``jdjlasdjajlad`` skd asdk s ds� ksa�lkda�s## dfkdjsdal html xxxx ##endif xxxx o Comments ##// ##/* ##*/ 5. Inline-Images - Source ist immer Bitmap-Grafik im GIF Format! (Fuer ASCII: gifscii, Fuer HTML: Direkt, Fuer PS: gif2ps) ##img xx.gif size=jsjs s=xx - UNBEDINGT Unicode und UTF-8 unterstutezen von anfang an! Idea for homogenous tags: - any XX tags can be repeated multiple times, ie XXXXX is valid also - any begin XX tag at the end of a paragraph wraps around its scope, ie it is applied to the whole paragraph as it would stand at the start of the block (marged out?). Results: - headlines are marked equally with blocks @ 1.2 log @*** empty log message *** @ text @d34 2 a35 2 | <1d-tag> | ::= | "||##||" \| "||\|\|||" \| "||``||" \| "||''||" \| ... | | <2d-tag> | ::= | "||**||" \| ... | d55 11 a65 11 | ||__|| | underline | inline, block | | ||**|| | bold | inline, block | | ||//|| | italics | inline, block | | ||''|| | code | inline, block | | ||>>|| | indented | block | | ||==|| | paragraph header | block | | ||\|\||| | verbatim | inline, block (charblock only until EOL) | | ||[[|| | boxed | inline, block | | ||!!|| | centered | block | | ||))|| | right flushed | block | | ||((|| | left flushed | block | d69 1 a69 1 || This line contains //italics// and **bold** words. d76 1 a76 1 For referencing textual locations (both document internal and to d80 1 a80 1 | ||->||[//text//]||(||//scheme//||:||//path//||)|| | d82 1 a82 1 | ||->||//text//||(||//ref//||)|| | d84 1 a84 1 | ||(+||//ref//||+)|| | d89 1 a89 1 || Header A (+hA+) d108 4 a111 4 | ||==|| | I. | | ||--|| | II. | | ||~~|| | III. | | ||..|| | IV. | d115 1 a115 1 || A header line d126 3 a128 3 | ||-|| | ||o||, ||*|| | unordered | | ||-.|| | ||o.||, ||*.||, m/[0-9]+\./ | ordered | | ||::|| | | itemized | d132 1 a132 1 || A Sugar list: d143 1 a143 1 with a ||++|| tag. The contents of the <2d-block> consists of a d145 1 a145 1 ||\||| characters. The number of columns is indicated by the first table d153 2 a154 1 || ++ | | | | d156 1 a215 1 a259 30 Sprach-Konzepte --------------- 1. Escape-Konzept - per character - (per block) 2. Simples Markup-Konzept - Headers - Stufe 1-3 - optional automatische Nummerierung - Lists - Ordered - Unordered - Description - Tabellen - Inline Markup - Underline - Bold - Italics - Code - (Boxed) 3. Block-Konzept - (per paragraph) - per line - per word ??? - per character 4. Verweis-Konzept - internal (reference) - external (hyperlink) 5. Native-Output-Markup (if-html) d262 2 a263 1 - ToC d265 1 d267 2 a268 88 1. Escape-Konzept - per character Idee: \ - (per block) Idee: \\ (einsetzbar wie normales Blockkonzept) 2. Simples Markup-Konzept - Headers - Stufe 1 (Part) Foo Bar Foo Bar Foo Bar Foo Bar =============================== xxxxx - Stufe 2 (Chapter) Foo Bar Foo Bar Foo Bar Foo Bar ------------------------------- - Stufe 3 (Section) Foo Bar Foo Bar Foo Bar Foo Bar - - - - - - - - - - - - - - - - - Stufe 4 (Paragraph) Foo Bar Foo Bar Foo Bar Foo Bar Foo Bar Foo Bar Foo Bar Foo Bar Foo Bar Foo Bar Foo Bar Foo Bar Foo Bar Foo Bar Foo Bar Foo Bar Foo Bar Foo Bar Foo Bar Foo Bar -- dsh hdsjhfdhjf klfhd jkash flhaflhaslfhlfjdha dsh hdsjhfdhjf klfhd jkash flhaflhaslfhlfjdha dsh hdsjhfdhjf klfhd jkash flhaflhaslfhlfjdha dsh hdsjhfdhjf klfhd jkash flhaflhaslfhlfjdha - optional automatische Nummerierung - Lists - Ordered Idee: Wir erlauben [o-*+.] _UND_ . und durch . wird die defaultmaessige Zaehlung forciert auf 1. erster o xxxx o xxxx 1. xxxx - xxxx - xxxx 1. xxxx . xxxx o xxxx - Unordered (alle drei sind auf allen akzeptiert: o-*) o xxxx o xxxx - xxxx - xxxx - xxxx - xxxx - Description (sowohl newline als auch leading blank sind optional) das word:: xxxxx xxxxx das zweite word:: xxxxx - Code (ala TeX-Idee intern) || $ sjsjfkdjfd $ kfjd kfjdkjf $ xxx - Einrueckung >> ... - Tabellen XXX XXX XXXX foo bar quux foo baz quux bar bazfoo bar chrei: rse: ++ | | | | foo | !!bar | quux | foo | baz bar | quux | bazfoo fjfjkwq rwrqwd sddksjk | dsdjks | foo | dsdsds d293 1 a293 2 o In einer Tabelle koennen alle Zeichenformatierungen genutzt werden, ausser || (Vertatim). a296 18 - Inline Markup - Header (Paragraph): chrei: --xx xxx xxx-- rse: -- xxx xxx xxx - Underline: chrei: __xxx__ rse: _xxx_ (= underline) - (Boxed): chrei: --- rse: [_xxx_] - Bold: chrei: **xxx** rse: *xxx* (Asterisk is fat) - Italics: chrei: //xxx// rse: /xxx/ (/ von Italic Direction) - Code: chrei: ""xxx"" rse: |xxx| (courier/teletype ist gerade schrift) - Paragraph: chrei/rse: Leerzeile!! - Hartes Newline rse: ~~ d311 1 a311 1 (Anders gesagt: Es geht nicht um den linkenRand der Textdatein, sondern um d344 1 a344 1 o. ''Ein Sonderfall: __Dieser Text hat kein Ende-Tag. d349 2 a351 7 4. Verweis-Konzept(+) - internal (reference) anchor: foo (+Verweis-Konzept+) bar reference: foo ->Verweis-Konzept bar - external (hyperlink) reference: foo ->http://wwww bar @ 1.1 log @Initial revision @ text @d5 2 a6 2 Ralf S. Engelschall Christian Reiber d8 2 a9 2 Genesis: March 12th, 1999 Last Update: June 20th, 2000 d14 8 a21 8 Sugar is a markup language and corresponding translator tool for technical documentations which uses mostly invisible markup tags (the so-called "syntactic sugar"). The general idea is that the markup text looks already like the textual output of the translator phase, i.e., the sugar source can be already treated as its text format version. Additionally the Sugar markup language is considered intuitive enough to be recognized easily, so not writing technical documentation because of horrible markup languages is no longer an excuse. d28 12 a39 13 document ::= block* block ::= tagged-block | regular-block tagged-block ::= 1d-block | 2d-block 1d-block ::= 1d-tag document 2d-block ::= 2d-tag document 1d-tag ::= "##" | "||" | "``" | "''" | ... 2d-tag ::= "**" | ... where "regular-block" is defined visually as a rectangular block of continued text inside the document, i.e., a paragraph of text (without any blank lines) where each line starts at the same indentation position. d44 2 a45 1 The tags of the Sugar markup language are described in the following: d47 1 a47 1 o Formatting: d49 45 a93 25 __ underline (inline, block) ** bold (inline, block) // italics (inline, block) '' code (inline, block) >> indented (block) == paragraph header (block) || verbatim (inline, block) (charblock only until EOL) [[ boxed (inline, block) !! centered (block) )) right flushed (block) (( left flushed (block) o Links and References: -->text(scheme:path) external hyperlink via URL -->text(ref) internal hyperlink via anchor-name (+ref+) internal anchor definition o Headers (by underlining a piece of text): ======= 1.level ------- 2.level - - - - 3.level - - - 4.level d95 29 a123 1 o Lists: (identified by first non-blank character in line) d125 38 a162 16 o - * unordered o. -. *. ordered (digit start selectable with 1., 2., 3., ...) words :: glossary-style o Tables: ++ | | | | foo | !!bar | quux | foo | baz bar | quux | bazfoo fjfjkwq rwrqwd sddksjk | dsdjks | foo | dsdsds d172 1 a172 1 \- em-dash (ger. "Gedankenstrich") d177 1 d518 7 @ 1.1.1.1 log @Create Sugar CVS environment @ text @@