RfD: Escaped Strings (Version 6)



  • 1. <# # #S #> vs >NUMBER
    Hello everybody, I just had a thought and I was wondering why this particular path had been chosen under ANS. I noticed a lack of consistency between pictured numeric output and conversion of numeric input. Is there a reason for this? I had in mind something more consistent syntax/semantics wise for input through something like <$ $ $S $> with a PASS instead of HOLD, basically allowing for any input formats to be constructed. Regards Jean-Francois Michaud

RfD: Escaped Strings (Version 6)

Postby Peter Knaggs » Tue, 25 Nov 2008 04:34:03 GMT

t the last meeting, the committee requested a change before we take
this to a CfV so here is the updated version. Also includes my new
contact details.

RfD: Escaped Strings S\"
23 November 2008, Stephen Pelc/Peter Knaggs

20081123 Replaced description of \" (now the same as for \q).
Replaced the test cases with tests that do not assume
the word can be used in interpretation mode.
In keeping with the definition.
20071030 Clarification of case sensitivity:
Escape character is case sensitive,
Hex digits are not.
20070913 Added clarifications.
20070719 Modified ambiguous condition.
Added ambiguous conditions to definition of S\".
Added test cases.
Corrected Reference Implementation.
20070712 Redrafted non-normative portions.
20060822 Updated solution section.
20060821 First draft.


The word S" 6.1.2165 is the primary word for generating strings.
In more complex applications, it suffers from several deficiencies:
1) the S" string can only contain printable characters,
2) the S" string cannot contain the '"' character,
3) the S" string cannot be used with wide characters as discussed
in the Forth 200x internationalisation and XCHAR proposals.

Current practice
At least SwiftForth, gForth and VFX Forth support S\" with very
similar operations. S\" behaves like S", but uses the '\' character
as an escape character for the entry of characters that cannot be
used with S".

This technique is widespread in languages other than Forth.

It has benefit in areas such as
1) construction of multiline strings for display by operating
system services,
2) construction of HTTP headers,
3) generation of GSM modem and Telnet control strings.

The majority of current Forth systems contain code, either in the
kernel or in application code, that assumes char=byte=au. To avoid
breaking existing code, we have to live with this practice.

The following list describes what is currently available in the
surveyed Forth systems that support escaped strings.

\a BEL (alert, ASCII 7)
\b BS (backspace, ASCII 8)
\e ESC (escape, ASCII 27)
\f FF (form feed, ASCII 12)
\l LF (line feed, ASCII 10)
\m CR/LF pair (ASCII 13, 10) - for HTML etc.
\n newline - CRLF for Windows/DOS, LF for Unices
\q double-quote (ASCII 34)
\r CR (carriage return, ASCII 13)
\t HT (horizontal tab, ASCII 9)
\v VT (vertical tab, ASCII 11)
\z NUL (no character, ASCII 0)
\" double-quote (ASCII 34)
\[0-7]+ Octal numerical character value, finishes at the
first non-octal character
\x[0-9a-f]+ Hex numerical character value, finishes at the
first non-hex character
\\ backslash itself
\ before any other character represents that character

We are trying to integrate several issues:

1) no/least code breakage
2) minimal standards changes
3) variable width character sets
4) small system functionality

Item 1) is about the common char=byte=au assumption.
Item 2) includes the use of COUNT to step through memory and the
impact of char in the file word sets.
Item 3) has to rationalise a fixed width serial/comms channel

Re: RfD: Escaped Strings (Version 6)

Postby The Beez' » Wed, 26 Nov 2008 18:11:28 GMT

I think a filter would work just as well. BTW, embedding the hex
number escape sequence in this standard bloats the code unnecessarily.
These two filters work just as well IMHO and can be loaded when needed
(with some added functionality).

Hans Bezemer

\ 4tH library - S>EMBED - Copyright 2008 J.L. Bezemer
\ You can redistribute this file and/or modify it under
\ the terms of the GNU General Public License

\ Typical use: S" [#33=>$21]&41;" S>EMBED

[UNDEFINED] s>embed [IF]
[UNDEFINED] >number [IF] include lib/tonumber.4th [THEN]
[UNDEFINED] row     [IF] include lib/row.4th [THEN]

create \embed
  char % , 2 ,
  char & , 8 ,
  char # , 10 ,
  char $ , 16 ,
  null ,

:this \embed does> 2 row dup >r if nip cell+ @c else drop then r> ;

: s>embed                              ( a1 n1 -- a1 n2)
  ['] nkey= is key=                    \ set ROW datatype
  over dup >r >r                       \ setup loop parameters
  begin                                \ begin loop
    dup                                \ limit reached?
  while                                \ if not
    over c@ >r chop r> \embed          \ if embedded, convert number
    if base @ >r base ! 0 -rot >number rot r> base ! then
    r@ c! r> char+ >r                  \ save and update parameters
  repeat                               \ next character
  drop drop r> r> tuck -               \ cleanup stacks
hide \embed
\ 4tH library - S>ESCAPE - Copyright 2008 J.L. Bezemer
\ You can redistribute this file and/or modify it under
\ the terms of the GNU General Public License

\ It allows you to convert an S" string containing escape characters
\ to an escaped string.

\ Typical use: S" \qThis is the end\q" S>ESCAPE

[UNDEFINED] S>ESCAPE [IF]              \ Table of translations for \a..
offset \escape                         ( n -- c)
         7 c,    \ \a
         8 c,    \ \b
    char c c,    \ \c
    char d c,    \ \d
        27 c,    \ \e
        12 c,    \ \f
    char g c,    \ \g
    char h c,    \ \h
    char i c,    \ \i
    char j c,    \ \j
    char k c,    \ \k
        10 c,    \ \l
    char m c,    \ \m
        10 c,    \ \n (Unices only)
    char o c,    \ \o
    char p c,    \ \p
    char " c,    \ \q
        13 c,    \ \r
    char s c,    \ \s
         9 c,    \ \t
    char u c,    \ \u
        11 c,    \ \v
    char w c,    \ \w
    char x c,    \ \x
    char y c,    \ \y
         0 c,    \ \z

: s>escape                             ( a1 n1 -- a1 n2)
  over dup >r >r                       \ setup loop parameters
    dup                                \ limit reached?
  while                                \ if not
    over c@ dup [char] \ =             \ is it a backslash
    if                                 \ is it an escaped character
      drop chop over c@ dup [char] a - max-n and 26 <
      if [char] a - \escape then       \ if so, escape character
    r@ c! chop r> char+ >r             \ save and update parameters
  repeat                               \ next character
  drop drop r> r> tuck -               \ cleanup stacks
hide \escape

Return to forth


Who is online

Users browsing this forum: No registered users and 88 guest