RfD: String comparison words version 1

forth

    Next

  • 1. Halfword operators [Was Forth and Unix -- history]
    Elizabeth D Rather < XXXX@XXXXX.COM > wrote: > As others have noted, all Forths on 32-bit architectures (and larger) do > have half-cell operators. They will probably be standardized one of > these days, maybe in the current round. I think all that's required is > agreeing on the prefix name (both H<x> and W<x> are in use, I think). It might be a good idea to do something more like the approach taken by C, which is: [ ... ] The typedef name uintN_t designates an unsigned integer type with width N . Thus, uint24_t denotes an unsigned integer type with a width of exactly 24 bits. These types are optional. However, if an implementation provides integer types with widths of 8, 16, 32, or 64 bits, it shall define the corresponding typedef names. [ ... ] So, rather than H@, something like S16@ for a signed 16-bit fetch and U16@ for an unsigned one. This scales much better than H@ and will make it easier to write more portable code. On many systems these will just be aliases for H@ etc, and so probably won't be in conflict with existing use. Andrew.
  • 2. The long tail of programming languages and what Forth has done for ?me.
    Ged Byrne < XXXX@XXXXX.COM > wrote: > One lucky coincidence is that at the same time I've been reading the > recent thread started by Andy Valencia where he seems to argue that > Forth's failure to achieve C's popularity and growth somehow > diminishes the language itself. No; my article was about programming languages and evolution. And I'm grateful for the time people took to read and write on the subject in that thread. Regards, Andy Valencia

Re: RfD: String comparison words version 1

Postby Aleksej Saushev » Wed, 24 Nov 2010 17:33:49 GMT

lex McDonald < XXXX@XXXXX.COM > writes:


I still don't like standard words being contractions when there's no
problem using full form. "STRING=" is much better by all means except
length, and it is only 3 letters longer. You don't know if programmer
wants to use "STR" prefix for dynamic strings or not.


"Identical" or "equal"?

"Identical" means that objects are the same, "equal" means that objects
are allowed to be different still equal. Look up "object identity"
in programming literature.


This is unacceptible since it ignores natural string comparison rules.


Same as above.


These are very weak arguments since all these cases are eliminated with
primitive peephole optimiser.


Standardising STRING= STR= S= $= and so on is still problematic, yet
you're writing this proposal somehow.

Standardising case insensitive COMPARE would solve remaining string
comparison problems.


Proper implementation of equality test for strings requires inspecting
all the characters anyway.


And this is major drawback of this proposal.


Create it. There're more freely available Forths than Win32Forth and Gforth,
some of them are portable.


This makes grounds to review the practice and amend standard to require
non-destructive comparison.


This is definitly wrong. If your words are not useful for anything except
internal problems of your implementation and your programs, they should
not be standardised at all, let alone take useful names.


--
HE CE3OH...

Re: RfD: String comparison words version 1

Postby Elizabeth D Rather » Wed, 24 Nov 2010 17:40:09 GMT


...

I strongly agree.

Cheers,
Elizabeth

-- 
==================================================
Elizabeth D. Rather   (US & Canada)   800-55-FORTH
FORTH Inc.                         +1 310.999.6784
5959 West Century Blvd. Suite 700
Los Angeles, CA 90045
 http://www.**--****.com/ 

"Forth-based products and Services for real-time
applications since 1973."
==================================================

Re: RfD: String comparison words version 1

Postby Bernd Paysan » Thu, 25 Nov 2010 00:20:37 GMT




Not with my grep:

~/proj/gforth> grep '$=' $(find . -name '*.fs')
~/proj/gforth> 

Maybe there was some code in regexp.fs, which had a word $= at some time, 
but lost it in the meantime.

-- 
Bernd Paysan
"If you want it done right, you have to do it yourself!"
 http://www.**--****.com/ ~paysan/

Re: RfD: String comparison words version 1

Postby Coos Haak » Thu, 25 Nov 2010 01:52:00 GMT

Op Tue, 23 Nov 2010 16:20:37 +0100 schreef Bernd Paysan:




Version 0.7.0 has it in regexp.fs. Once as its definition, once used in ,="
Replacing it with `tuck compare' is no big deal.

-- 
Coos

CHForth, 16 bit DOS applications
 http://www.**--****.com/ 

Re: RfD: String comparison words version 1

Postby Albert van der Horst » Thu, 25 Nov 2010 03:51:19 GMT

In article < XXXX@XXXXX.COM >,


<SNIP>

In the library of ciforth there is $=   ( adr len adr' len' - flag)
It shortcuts if the lenght are unequal, then does a byte compare
(CORA) over the length on both addresses.

Groetjes Albert


Re: RfD: String comparison words version 1

Postby Albert van der Horst » Thu, 25 Nov 2010 03:51:19 GMT

In article < XXXX@XXXXX.COM >,


<SNIP>

In the library of ciforth there is $=   ( adr len adr' len' - flag)
It shortcuts if the lenght are unequal, then does a byte compare
(CORA) over the length on both addresses.

Groetjes Albert


Re: RfD: String comparison words version 1

Postby Andrew Haley » Thu, 25 Nov 2010 04:41:03 GMT



Any COMPARE that does so is nonstandard.  The specification of COMPARE
is straightforward, and does not permit such things.

Andrew.

Re: RfD: String comparison words version 1

Postby Aleksej Saushev » Wed, 01 Dec 2010 03:49:03 GMT

Andrew Haley < XXXX@XXXXX.COM > writes:




Demonstrate it, because if one follows words in (draft) standard, your
understanding doesn't follow.


-- 
HE CE3OH...

Re: RfD: String comparison words version 1

Postby Andrew Haley » Wed, 01 Dec 2010 04:35:03 GMT




Well, let's see.

"The strings are compared, beginning at the given addresses, character
by character, up to the length of the shorter string or until a
difference is found."  So, each character in s1 is compared with the
corresponding character in s2 until a difference is found.  There is
no permission, for example, to treat a pair of characters in s1 as
being the same as a single character in s2.  A character type is a
numeric type (A.3.1.2), so any value from 0..255 may be put into a
string and COMPAREd.  There is no restriction to printable ASCII.
Therefore, there is no way to interpret this text as anything other
than comparing two arrays of bytes, byte-for-byte.  (Where a byte is
the size of a char, not necessarily an octet.)

Andrew.

Re: RfD: String comparison words version 1

Postby Elizabeth D Rather » Wed, 01 Dec 2010 05:57:11 GMT






Remember that standards specify *minimum* requirements.  Any COMPARE has 
to work according to this description in order to be standard.  I think 
it would be perfectly *legal* for a COMPARE to do such things as Alexei 
suggests, but any program that depends on that enhanced behavior would 
obviously have a dependency on it.

Cheers,
Elizabeth

-- 
==================================================
Elizabeth D. Rather   (US & Canada)   800-55-FORTH
FORTH Inc.                         +1 310.999.6784
5959 West Century Blvd. Suite 700
Los Angeles, CA 90045
 http://www.**--****.com/ 

"Forth-based products and Services for real-time
applications since 1973."
==================================================

Re: RfD: String comparison words version 1

Postby Alex McDonald » Wed, 01 Dec 2010 06:03:51 GMT




> FORTH Inc. 1>310.999.6784
> 5959 West Century Bl>d. Suite 700
> Los Angeles, CA 90045http://>ww>forth.com
>
> "Forth-based products and Services >or real-time
> applications>since 1973."
> ==================================================

I would not expect aeroplane to match oplane without a locale, and
the standard contains no such imprecation.

Re: RfD: String comparison words version 1

Postby Elizabeth D Rather » Wed, 01 Dec 2010 06:34:40 GMT

n 11/29/10 11:03 AM, Alex McDonald wrote:

Indeed, most people would not expect it, but a system tailored to a
specific locale might perfectly well offer that as an extra feature,
just as systems can, if they wish, provide alternative character sets.
Such a system would be in violation of the standard if it could *not*
match 'aeroplane' correctly, but the extra facility wouldn't not render
it non-standard.

Most systems, after all, contain additional features not provided for in
the standard. It's up to the writer of an application whether to use
features of a system that are above and beyond the standard: only the
app writer can assess the value of a particular enhancement and the
likelihood of its needing to be ported to a different system.

Cheers,
Elizabeth

--
==================================================
Elizabeth D. Rather (US & Canada) 800-55-FORTH
FORTH Inc. +1 310.999.6784
5959 West Century Blvd. Suite 700
Los Angeles, CA 90045
http://www.forth.com

"Forth-based products and Services for real-time
applications since 1973."
==================================================

Re: RfD: String comparison words version 1

Postby Alex McDonald » Wed, 01 Dec 2010 07:16:24 GMT

n Nov 29, 9:34pm, Elizabeth D Rather < XXXX@XXXXX.COM > wrote:

Then COMPARE is a is Andrew states, surely, as it wouldn't match
correctly. Aleksej is making the case for pairs of characters, not
single characters. I would understand if he suggested that characters
outside ASCII (127 and above) were to be the subject of a locale, but
not ae as a digraph for .

A search on LOCALE in this newsgroup is illuminating. There have been
arguments since time immemorial (well, for a decade at least) on the
support for locales. Here's a recent RfD from last year
http://groups.google.com/group/comp.lang.forth/msg/ddd89c75b34e2449?dmode=source

Much of this has been eliminated by UCS and UTF-8 support, and 8 bit
ASCII (if there ever was such a beast) should be allowed to die. Hence
my reluctance to propose ICOMPARE, a case insensitive compare, since
the proposal for ISTR= generated much heat from Aleksej on the
subject; my attempt to restrict case comparisons to the ASCII
characters was apparently "unacceptible [sic] since it ignores natural
string comparison rules", of which he gave equality for "e" and "" as
an example. Which I suspect, absent of a suitable proposal for locale
or any logical reason for supporting digraphs, is special pleading for
cyrillic alphabet support.




Re: RfD: String comparison words version 1

Postby BruceMcF » Wed, 01 Dec 2010 09:05:59 GMT




>>>> Andrew.

>> Remember that standards specify *minimum* requirements. ny COMPARE has
>> to work according to this description in order to be standard.  think
>> it would be perfectly *legal* for a COMPARE to do such things as Alexei
>> suggests, but any program that depends on that enhanced behavior would
>> obviously have a dependency on it.

Yes, and it seems the minimum requirement is that if the character ~
.... given that: char >> +n >> n >> x, and char >> +n >> u >> x ....
~ is to be compared to the length of the shorter or "until a
difference is found", it seems like c1 not different from c2 implies
that as signed numbers n1 not different from n2 and as unsigned
numbers u1 not different from u2, since c1 is a refinement of n1 or
u2, not the other way around.

Generically ICOMPARE would be a thing on a string of xchars, and
generically the ISTR= benefit of testing the length first may well go
away, since the length of a string of xchars is the number of pchars
in the string, not the number of xchars in the string.

Re: RfD: String comparison words version 1

Postby Andrew Haley » Wed, 01 Dec 2010 18:33:21 GMT








I accept the principle, but I do not believe that the specification
for COMPARE has sufficient wiggle room to allow it in this case.  A
character can be anything between 0..255, as the standard says.

Andrew.

Similar Threads:

1.RfD: String comparison words version 1 (was version 0)

2.RfD: String comparison words version 0

RfD: String comparison words

Change history
2010-11-05 Initial proposal, incomplete

Problem
-------

Although ANS Forth provides COMPARE for string comparisons, it has two
attributes that make it problematic; it is case sensitive and
expensive to execute due to the range of return values. Extend the
functionality of COMPARE to provide comparisons that are case-
insensitive, and comparisons that test for only equality.

STR= ( c-addr1 u1 c-addr2 u2 -- n )  STRING-EXT

Compare the string specified by c-addr1 u1 to the string specified by
c-addr2 u2. c-addr1 and c-addr2 point at read-only areas, which must
not be modified. If the strings are of different lengths (u1 is not
equal to u2), n is zero (0). Otherwise the strings are compared,
beginning at the given addresses, character by character, up to the
equal length of the strings or until a difference is found. Characters
are considered identical if they have the same numeric value. If the
two strings are identical, n is zero.

ISTR= ( c-addr1 u1 c-addr2 u2 -- n )  STRING-EXT

Compare the string specified by c-addr1 u1 to the string specified by
c-addr2 u2. c-addr1 and c-addr2 point at read-only areas, which must
not be modified. If the strings are of different lengths, n is zero
(0). If both strings are null (u1 and u2 are both zero), n is one (1).
Otherwise the strings are fetched, beginning at the given addresses,
character by character. Characters are considered identical if they
have the same numeric value, or if the characters fall between ASCII
values 'A' thru 'Z' they are considered to be identical to the
corresponding character values in the range 'a' thru 'z'. If the two
strings are identical, n is one (1); otherwise n is zero (0).

Remarks
-------

Why standardize these words? They can be defined in ANS Forth;

: STR= COMPARE 0= ;
: ISTR= <definition required> ;

For the following reasons:

    * Many systems define STR= or ISTR= (or the equiavlent with
another name)
    * Case insensitive Forths require such a word to search the
dictionary in a case-insensitive manner, and many expose these words
or their equivalents to the user.
    * The commonest use of COMPARE is in the form COMPARE 0=.
    * They will be implemented more efficiently on many systems.
Although string manipulation and handling is not employed extensively,
text processing applications benefit significantly.

Why no case-insensitive COMPARE?

While 'a' and 'A' can be considered equal, it is problematic to assign
a meaning to a comparison of 'a' against 'B' in terms of 'greater
than' or 'less than'. Numerically, 'B' (65 decimal) is less than
'a' (96 decimal), but collation sequences are normally defined in
terms of case-insensitive tests; 'A' precedes 'ab', which precedes
'B'. This RfD does not attempt to address these issues.

Note that the implementation of STR= and ISTR= does not describe the
values of c-addr1 or c-addr2 when u1 <> u2 (unequal length strings),
or when u1=u2=0 (null strings). Given that different implementations
may address these in their own way, supplying invalid values of c-
addr1 and c-addr2 in those cases (those that would cause an error if a
single character was fetched from either of those addresses) is an
ambigous condition.

Experience
----------

As a case insensitive Forth, Win32Forth exposes ISTR= , used to search
wordlists, as defined here.

<others>

3.RfD: String comparison words version 1

Alex McDonald < XXXX@XXXXX.COM > writes:

> RfD: String comparison words (Draft version 1)
>
> Change history
> 2010-11-05 Initial proposal, incomplete
> 2010-11-22 Expanded Remarks section
>            Expanded Experience section
>            Correction of errors
>
> Problem
> -------
>
> Although ANS Forth provides COMPARE for string comparisons, it has two
> attributes that make it problematic; it is case sensitive, and
> expensive to execute for equality or inequality (the common cases) due
> to the need to complete the comparison to return greater-than or less-
> than return values.
>
> Add to the existing functionality of COMPARE to provide comparisons
> that are case-insensitive and and that only test for equality.
>
> STR= ( c-addr1 u1 c-addr2 u2 -- flag )  STRING-EXT

I still don't like standard words being contractions when there's no
problem using full form. "STRING=" is much better by all means except
length, and it is only 3 letters longer. You don't know if programmer
wants to use "STR" prefix for dynamic strings or not.

> Compare the string specified by c-addr1 u1 to the string specified by
> c-addr2 u2. c-addr1 and c-addr2 point at read-only areas, which must
> not be modified. If the strings are of different lengths (u1<>u2),
> flag is FALSE. Otherwise the strings are compared, beginning at the
> given addresses, character by character, up to the equal length of the
> strings or until a difference is found. If the two strings are
> identical, flag is TRUE.

"Identical" or "equal"?

"Identical" means that objects are the same, "equal" means that objects
are allowed to be different still equal. Look up "object identity"
in programming literature.

> ISTR= ( c-addr1 u1 c-addr2 u2 -- flag )  STRING-EXT
>
> Compare the string specified by c-addr1 u1 to the string specified by
> c-addr2 u2. c-addr1 and c-addr2 point at read-only areas, which must
> not be modified. If the strings are of different lengths, flag is
> FALSE. Otherwise the strings are fetched, beginning at the given
> addresses, character by character. Characters are considered to match
> if they have the same numeric value, or, if the characters fall
> between ASCII values 'A' thru 'Z', they are considered to be identical
> to the corresponding character values in the range 'a' thru 'z'.

This is unacceptible since it ignores natural string comparison rules.

> If
> the two strings are identical, flag is TRUE, otherwise FALSE.

Same as above.

> Remarks
> -------
>
> Why standardize these words? They can be defined in ANS Forth, for
> example;
>
> : STR= COMPARE 0= ;
>
> For the following reasons:
>
> Most uses of COMPARE are for string equality or inequality for string
> prefices.
>
> Using COMPARE to test for inequality is inefficient, as strings with
> unequal lengths can immediately be declared unequal; but COMPARE must
> continue to fetch and check characters to determine whether it should
> return greater-than or less-than, even though the result of this
> additional work will be discarded.
>
> For strings of equal length, the overhead is less significant, but the
> result of the comparison must still be adjusted to indicate the
> required result.

These are very weak arguments since all these cases are eliminated with
primitive peephole optimiser.

> Although string manipulation and handling is not employed extensively,
> text processing applications benefit significantly. Letting the
> compiler optimize uses of COMPARE 0= into a more efficient word is
> possible, but the programmer must employ an expensive COMPARE followed
> by tests to reduce the range of the result on systems that do not
> synthesize more efficient tests for equality.
>
> Case insensitive Forths require words to search the dictionary in a
> case-insensitive manner. These tests and tests for prefixes require
> that the tested argument is either converted to all upper case (or all
> lower case), which generally requires copying the original string to a
> transient area and performing a suitable case translation, followed by
> an expensive COMPARE operation.
>
> Why no case-insensitive COMPARE?
>
> There are a wide variety of case-insensitive words employed by Forths
> for this function; ICOMPARE, COMPARE(NC), UCOMPARE amongst others.
> Standardising such widely varying words would be problematic.

Standardising STRING= STR= S= $= and so on is still problematic, yet
you're writing this proposal somehow.

Standardising case insensitive COMPARE would solve remaining string
comparison problems.

> Why no STR<, STR>, STR>= and so on?
>
> The implementation of any test beyond equality requires inspecting all
> the characters for the length of the shortest.

Proper implementation of equality test for strings requires inspecting
all the characters anyway.

> The differentiation
> between greater-than and less-than is trivial for implementations of
> COMPARE to determine, as it is set on meeting the first non-equal
> character, or on exhausting one or other of the strings. All of these
> variants can be efficently written using COMPARE.
>
> : STR< COMPARE 0 > ;
> : STR> COMPARE 0 < ;
> : STR>= COMPARE 1 < ;
>
> and so on.
>
> The current proposal does not allow the synthesizing of case
> insensitive comparisons due to a lack of appropriate ICOMPARE (or
> COMPARE(NC) etc).

And this is major drawback of this proposal.

> Experience
> ----------
>
> As a case insensitive Forth, Win32Forth exposes ISTR= , used to search
> wordlists, as defined here, and supplies a STR= not based on COMPARE.
>
> MPE's VFX Forth supplies STR= S= and IS=. S= is a buffer compare with
> the signature ( c-addr1 c-addr2 u -- flag ); IS= is the case
> insensitive equivalent. S= and IS= can be efficiently synthesized from
> STR= and ISTR= respectively;
>
> : S= ( c-addr1 c-addr2 u -- flag ) TUCK STR= ;
> : IS= ( c-addr1 c-addr2 u -- flag ) TUCK ISTR= ;
>
> [ Does VFX Forth provide an equivalent to ISTR=? ]
>
> Gforth supplies STR= STR< and STRING-PREFIX?.
>
> STRING-PREFIX? can be synthesized from STR= ;
>
> : STRING-PREFIX? ( c-addr1 u1 c-addr2 u2 -- flag )
>   TUCK 2>R MIN 2R> STR= ;
>
> [ Information on other Forths required here ]

Create it. There're more freely available Forths than Win32Forth and Gforth,
some of them are portable.

> Comments
> --------
>
> The ANS definition of COMPARE does not explicitly declare whether the
> input strings are read-only. Since COMPARE states that characters are
> "compared", the assumption is that they are read-only since no
> reasonable implementation needs to employ a destructive test.

This makes grounds to review the practice and amend standard to require
non-destructive comparison.

> With
> case-sensitive string comparisons, this RfD makes it clear that they
> are read-only, as implementors might be tempted to lower- or upper-
> case one or both of the strings prior to comparison.
>
> Note that the implementations do not assign a meaning to the values of
> c-addr1 or c-addr2 when u1<>u2 (unequal length strings), or when
> u1=u2=0 (null strings which always return TRUE). Given that different
> implementations may address these in their own way, supplying invalid
> values of c-addr1 and c-addr2 in those cases (those that would cause
> an error if a single character was fetched from either of those
> addresses) is an ambiguous condition.
>
> Case-insensitivity only considers ASCII 'A' thru 'Z' to be equal to
> the corresponding ASCII characters 'a' thru 'z'. No other characters
> outside that range are considered equal.

This is definitly wrong. If your words are not useful for anything except
internal problems of your implementation and your programs, they should
not be standardised at all, let alone take useful names.


-- 
HE CE3OH...

4.RfD: Escaped Strings version 4

5.RfD: Escaped Strings S\" (version 5)

6. RfD: Escaped Strings (Version 6.2)

7. RfD: Escaped Strings (Version 6)

8. Repairing parsing words (was: RfD: Defered words)



Return to forth

 

Who is online

Users browsing this forum: No registered users and 90 guest