RfD: Escaped Strings (Version 6)

forth

    Sponsored Links

    Next

  • 1. proposed additions to FSL-UTIL
    Currently, with the array and matrix manipulation words defined in the FSL utilities file (FSL_UTIL or FSL-UTIL), it is tedious to initialize arrays and matrices. Marcel Hendrix has utilized a word called }}FREAD in some of his contributions, but this is useful only for initialization in a file. I suggest the following solution: }FPUT ( N 'v -- ) ( F: r1 r2 ... rN -- ) or ( r1 r2 ... rN N 'v -- ) }}FPUT ( N M 'A -- ) ( F: r00 r01 ... rNM -- ) or ( r00 ... rNM N M 'A -- ) Their usage is illustrated below: 3 FLOAT array v{ -2e 1e 3e 3 v{ }fput 3 3 FLOAT matrix A{{ 2e 1e 0e 1e 3e 1e 0e 1e 4e 3 3 A{{ }}fput With only the tools provided in the current FSL-UTIL, one has to do the following: 3 FLOAT array v{ -2e 1e 3e v{ 0 } F! v{ 1 } F! v{ 2 } F! 3 3 FLOAT matrix A{{ 2e A{{ 0 0 }} F! 1e A{{ 0 1 }} F! 0e A{{ 0 2 }} F! 1e A{{ 1 0 }} F! 3e A{{ 1 1 }} F! 1e A{{ 1 2 }} F! 0e A{{ 2 0 }} F! 1e A{{ 2 1 }} F! 4e A{{ 2 2 }} F! I plan to incorporate the words }FPUT and }}FPUT into the kForth FSL-UTIL file unless there is some consensus developed on an alternative method. Krishna Myneni
  • 2. Leaving BEGIN/REPEAT
    You call it LEAVE when you get out of a DO/LOOP and continue within a word. You call it EXIT when you get out of a word altogether and return to the calling word. What to dyou call it when you get out of a BEGIN/REPEAT (etc.) and continue within a word? Bob Jaffray

RfD: Escaped Strings (Version 6)

Postby Peter Knaggs » Tue, 25 Nov 2008 04:34:03 GMT

t the last meeting, the committee requested a change before we take
this to a CfV so here is the updated version. Also includes my new
contact details.

RfD: Escaped Strings S\"
23 November 2008, Stephen Pelc/Peter Knaggs

20081123 Replaced description of \" (now the same as for \q).
Replaced the test cases with tests that do not assume
the word can be used in interpretation mode.
In keeping with the definition.
20071030 Clarification of case sensitivity:
Escape character is case sensitive,
Hex digits are not.
20070913 Added clarifications.
20070719 Modified ambiguous condition.
Added ambiguous conditions to definition of S\".
Added test cases.
Corrected Reference Implementation.
20070712 Redrafted non-normative portions.
20060822 Updated solution section.
20060821 First draft.

Rationale
=========

Problem
-------
The word S" 6.1.2165 is the primary word for generating strings.
In more complex applications, it suffers from several deficiencies:
1) the S" string can only contain printable characters,
2) the S" string cannot contain the '"' character,
3) the S" string cannot be used with wide characters as discussed
in the Forth 200x internationalisation and XCHAR proposals.

Current practice
----------------
At least SwiftForth, gForth and VFX Forth support S\" with very
similar operations. S\" behaves like S", but uses the '\' character
as an escape character for the entry of characters that cannot be
used with S".

This technique is widespread in languages other than Forth.

It has benefit in areas such as
1) construction of multiline strings for display by operating
system services,
2) construction of HTTP headers,
3) generation of GSM modem and Telnet control strings.

The majority of current Forth systems contain code, either in the
kernel or in application code, that assumes char=byte=au. To avoid
breaking existing code, we have to live with this practice.

The following list describes what is currently available in the
surveyed Forth systems that support escaped strings.

\a BEL (alert, ASCII 7)
\b BS (backspace, ASCII 8)
\e ESC (escape, ASCII 27)
\f FF (form feed, ASCII 12)
\l LF (line feed, ASCII 10)
\m CR/LF pair (ASCII 13, 10) - for HTML etc.
\n newline - CRLF for Windows/DOS, LF for Unices
\q double-quote (ASCII 34)
\r CR (carriage return, ASCII 13)
\t HT (horizontal tab, ASCII 9)
\v VT (vertical tab, ASCII 11)
\z NUL (no character, ASCII 0)
\" double-quote (ASCII 34)
\[0-7]+ Octal numerical character value, finishes at the
first non-octal character
\x[0-9a-f]+ Hex numerical character value, finishes at the
first non-hex character
\\ backslash itself
\ before any other character represents that character

Considerations
--------------
We are trying to integrate several issues:

1) no/least code breakage
2) minimal standards changes
3) variable width character sets
4) small system functionality

Item 1) is about the common char=byte=au assumption.
Item 2) includes the use of COUNT to step through memory and the
impact of char in the file word sets.
Item 3) has to rationalise a fixed width serial/comms channel

Re: RfD: Escaped Strings (Version 6)

Postby The Beez' » Wed, 26 Nov 2008 18:11:28 GMT


I think a filter would work just as well. BTW, embedding the hex
number escape sequence in this standard bloats the code unnecessarily.
These two filters work just as well IMHO and can be loaded when needed
(with some added functionality).

Hans Bezemer

---8<---
\ 4tH library - S>EMBED - Copyright 2008 J.L. Bezemer
\ You can redistribute this file and/or modify it under
\ the terms of the GNU General Public License

\ Typical use: S" [#33=>$21]&41;" S>EMBED

[UNDEFINED] s>embed [IF]
[UNDEFINED] >number [IF] include lib/tonumber.4th [THEN]
[UNDEFINED] row     [IF] include lib/row.4th [THEN]

create \embed
  char % , 2 ,
  char & , 8 ,
  char # , 10 ,
  char $ , 16 ,
  null ,

:this \embed does> 2 row dup >r if nip cell+ @c else drop then r> ;

: s>embed                              ( a1 n1 -- a1 n2)
  ['] nkey= is key=                    \ set ROW datatype
  over dup >r >r                       \ setup loop parameters
  begin                                \ begin loop
    dup                                \ limit reached?
  while                                \ if not
    over c@ >r chop r> \embed          \ if embedded, convert number
    if base @ >r base ! 0 -rot >number rot r> base ! then
    r@ c! r> char+ >r                  \ save and update parameters
  repeat                               \ next character
  drop drop r> r> tuck -               \ cleanup stacks
;
[DEFINED] 4TH# [IF]
hide \embed
[THEN]
[THEN]
---8<---
\ 4tH library - S>ESCAPE - Copyright 2008 J.L. Bezemer
\ You can redistribute this file and/or modify it under
\ the terms of the GNU General Public License

\ It allows you to convert an S" string containing escape characters
\ to an escaped string.

\ Typical use: S" \qThis is the end\q" S>ESCAPE

[UNDEFINED] S>ESCAPE [IF]              \ Table of translations for \a..
\z.
offset \escape                         ( n -- c)
         7 c,    \ \a
         8 c,    \ \b
    char c c,    \ \c
    char d c,    \ \d
        27 c,    \ \e
        12 c,    \ \f
    char g c,    \ \g
    char h c,    \ \h
    char i c,    \ \i
    char j c,    \ \j
    char k c,    \ \k
        10 c,    \ \l
    char m c,    \ \m
        10 c,    \ \n (Unices only)
    char o c,    \ \o
    char p c,    \ \p
    char " c,    \ \q
        13 c,    \ \r
    char s c,    \ \s
         9 c,    \ \t
    char u c,    \ \u
        11 c,    \ \v
    char w c,    \ \w
    char x c,    \ \x
    char y c,    \ \y
         0 c,    \ \z

: s>escape                             ( a1 n1 -- a1 n2)
  over dup >r >r                       \ setup loop parameters
  begin
    dup                                \ limit reached?
  while                                \ if not
    over c@ dup [char] \ =             \ is it a backslash
    if                                 \ is it an escaped character
      drop chop over c@ dup [char] a - max-n and 26 <
      if [char] a - \escape then       \ if so, escape character
    then
    r@ c! chop r> char+ >r             \ save and update parameters
  repeat                               \ next character
  drop drop r> r> tuck -               \ cleanup stacks
;
[DEFINED] 4TH# [IF]
hide \escape
[THEN]
[THEN]
---8<---



Return to forth

 

Who is online

Users browsing this forum: No registered users and 56 guest