RfD: String comparison words version 0

forth

    Sponsored Links

    Next

  • 1. embedded startup - general advice requested
    I'm starting with a blank workbench appart from solder station/dvm etc. This is a garden shed type startup (UK based) I have a good deal of outdated experience but have done nothing lately. I'm starting a small project and need to use an embedded controler and select a suitable development environment to work with. The final product needs to be 1. Quick to develop 2. Cheap to produce in low-medium quantities 3. as future proof as possible (ie no "end of line items") 4. as leading edge as possible 5. free from license issues (NO GPL or royalties etc) 6. quick to get to market/end user status Would anyone like to help me select a good environment please? Obviously I'm looking to use embedded forth if at all possible but what hardware/support software etc would be a good choice. Would something like Femtoos OS be any real use to me? Which forth would be the best for this type of work and with controler family should I be looking at for it? (By best I mean programming environment / IDE / professionalism etc) I know about PIC but nothing at all about AVR? I dont want to spend a couple of hundred "just to get started" and have to upgrade later - if there's a good setup I can use I dont mind spending a little extra to get going. Does anything in the market stand out ? Any comments welcome. thanks john.
  • 2. Standardisation
    Gerry < XXXX@XXXXX.COM > wrote: > > I've sometimes thought that it would be nice to have a Forth that > didn't have any of these restrictions. Not to start from scratch, > but to take ANS Forth/20XX as the starting point and remove > all possible restrictions. You mention some with DO LOOPs and > locals. Off the top of my head others might be to prohibit the > system polluting the data stack with things like colon-sys; > permitting POSTPONE TO; add features that people want but can't > get because they break existing code or aren't common practice; > and so on. An aim would be to reduce the 'dead-hand' effects > of legacy code. > > Such a Forth would maintain compatibility with existing programs > which would run unchanged, but of course, programs written to the > new Forth would not necessarily work on existing Forths but that > would be the accepted cost from the outset. Once you've done that, you basically have a single implementation standard, like fig-FORTH. There's nothing wrong with that, but it's a very different thing from an abstract virtual machine, which is the approach that ANS tends to take for programming languages. Andrew.

RfD: String comparison words version 0

Postby Alex McDonald » Sun, 07 Nov 2010 06:31:12 GMT

RfD: String comparison words

Change history
2010-11-05 Initial proposal, incomplete

Problem
-------

Although ANS Forth provides COMPARE for string comparisons, it has two
attributes that make it problematic; it is case sensitive and
expensive to execute due to the range of return values. Extend the
functionality of COMPARE to provide comparisons that are case-
insensitive, and comparisons that test for only equality.

STR= ( c-addr1 u1 c-addr2 u2 -- n )  STRING-EXT

Compare the string specified by c-addr1 u1 to the string specified by
c-addr2 u2. c-addr1 and c-addr2 point at read-only areas, which must
not be modified. If the strings are of different lengths (u1 is not
equal to u2), n is zero (0). Otherwise the strings are compared,
beginning at the given addresses, character by character, up to the
equal length of the strings or until a difference is found. Characters
are considered identical if they have the same numeric value. If the
two strings are identical, n is zero.

ISTR= ( c-addr1 u1 c-addr2 u2 -- n )  STRING-EXT

Compare the string specified by c-addr1 u1 to the string specified by
c-addr2 u2. c-addr1 and c-addr2 point at read-only areas, which must
not be modified. If the strings are of different lengths, n is zero
(0). If both strings are null (u1 and u2 are both zero), n is one (1).
Otherwise the strings are fetched, beginning at the given addresses,
character by character. Characters are considered identical if they
have the same numeric value, or if the characters fall between ASCII
values 'A' thru 'Z' they are considered to be identical to the
corresponding character values in the range 'a' thru 'z'. If the two
strings are identical, n is one (1); otherwise n is zero (0).

Remarks
-------

Why standardize these words? They can be defined in ANS Forth;

: STR= COMPARE 0= ;
: ISTR= <definition required> ;

For the following reasons:

    * Many systems define STR= or ISTR= (or the equiavlent with
another name)
    * Case insensitive Forths require such a word to search the
dictionary in a case-insensitive manner, and many expose these words
or their equivalents to the user.
    * The commonest use of COMPARE is in the form COMPARE 0=.
    * They will be implemented more efficiently on many systems.
Although string manipulation and handling is not employed extensively,
text processing applications benefit significantly.

Why no case-insensitive COMPARE?

While 'a' and 'A' can be considered equal, it is problematic to assign
a meaning to a comparison of 'a' against 'B' in terms of 'greater
than' or 'less than'. Numerically, 'B' (65 decimal) is less than
'a' (96 decimal), but collation sequences are normally defined in
terms of case-insensitive tests; 'A' precedes 'ab', which precedes
'B'. This RfD does not attempt to address these issues.

Note that the implementation of STR= and ISTR= does not describe the
values of c-addr1 or c-addr2 when u1 <> u2 (unequal length strings),
or when u1=u2=0 (null strings). Given that different implementations
may address these in their own way, supplying invalid values of c-
addr1 and c-addr2 in those cases (those that would cause an error if a
single character was fetched from either of those addresses) is an
ambigous condition.

Experience
----------

As a case insensitive Forth, Win32Forth exposes ISTR= , used to search
wordlists, as defined here.

<others>

Re: RfD: String comparison words version 0

Postby Alex McDonald » Sun, 07 Nov 2010 06:38:15 GMT

n Nov 5, 9:31m, Alex McDonald < XXXX@XXXXX.COM > wrote:
> another name)> > * Case insensitive Forths require such a word to search t>e
> dictionary in a case-insensitive manner, and many expose these wor>s
> or their equivalents to the use>.
> * The commonest use of COMPARE is in the form COMPARE>0=.
> * They will be implemented more efficiently on many sy>tems.
> Although string manipulation and handling is not employed extens>vely,
> text processing applications benefit signific>nt>y.
>
> Why no case-insensitive CO>PA>E?
>
> While 'a' and 'A' can be considered equal, it is problematic to >ssign
> a meaning to a comparison of 'a' against 'B' in terms of 'g>eater
> than' or 'less than'. Numerically, 'B' (65 decimal) is les> than
> 'a' (96 decimal), but collation sequences are normally defi>ed in
> terms of case-insensitive tests; 'A' precedes 'ab', which pr>cedes
> 'B'. This RfD does not attempt to address these i>su>s.
>
> Note that the implementation of STR= and ISTR= does not descri>e the
> values of c-addr1 or c-addr2 w&g<;en u1 <> u2 (unequal length str>ngs),
> or when u1=u2=0 (null strings). Given that different implement>tions
> may address these in their own way, supplying invalid values>of c-
> addr1 and c-addr2 in those cases (those that would cause an erro> if a
> single character was fetched from either of those addresses)>is an
> ambigous cond>ti>n.
>
> Expe>ience
> ---->-->--
>
> As a case insensitive Forth, Win32Forth exposes ISTR= , used to >earch
> wordlists, as defined>he>e.
<
>>

For STR= replace the sentence

If the two strings are identical, n is zero.

with

If the two strings are identical, n is one (1).


Re: RfD: String comparison words version 0

Postby Alex McDonald » Sun, 07 Nov 2010 07:09:40 GMT

n Nov 5, 9:31m, Alex McDonald < XXXX@XXXXX.COM > wrote:
> another name)> > * Case insensitive Forths require such a word to search t>e
> dictionary in a case-insensitive manner, and many expose these wor>s
> or their equivalents to the use>.
> * The commonest use of COMPARE is in the form COMPARE>0=.
> * They will be implemented more efficiently on many sy>tems.
> Although string manipulation and handling is not employed extens>vely,
> text processing applications benefit signific>nt>y.
>
> Why no case-insensitive CO>PA>E?
>
> While 'a' and 'A' can be considered equal, it is problematic to >ssign
> a meaning to a comparison of 'a' against 'B' in terms of 'g>eater
> than' or 'less than'. Numerically, 'B' (65 decimal) is les> than
> 'a' (96 decimal), but collation sequences are normally defi>ed in
> terms of case-insensitive tests; 'A' precedes 'ab', which pr>cedes
> 'B'. This RfD does not attempt to address these i>su>s.
>
> Note that the implementation of STR= and ISTR= does not descri>e the
> values of c-addr1 or c-addr2 w&g<;en u1 <> u2 (unequal length str>ngs),
> or when u1=u2=0 (null strings). Given that different implement>tions
> may address these in their own way, supplying invalid values>of c-
> addr1 and c-addr2 in those cases (those that would cause an erro> if a
> single character was fetched from either of those addresses)>is an
> ambigous cond>ti>n.
>
> Expe>ience
> ---->-->--
>
> As a case insensitive Forth, Win32Forth exposes ISTR= , used to >earch
> wordlists, as defined>he>e.
<
>>

Some things I'm not sure should be in the RfD; advice please.

Comments
--------

For ISTR= it's unclear to me if all control characters (ASCII<values <
32) and ' ' (blank, value 32) should compare equal.

I anticipate that c-addr1 u1 and c-addr2 u2 can overlap. For example;

CREATE TEST 65 c, 65 c, 65 c, 65 c, \ "BBBB"
TEST 3 TEST 1+ 3 STR=

returns 1 if TEST consists of all the same character. Should that be
spelled out in the RfD?

XISTR= might be considered for the XCHARS proposal, or at least the
word reserved. (Case insensitivity is complex for UTF code points.)


Re: RfD: String comparison words version 0

Postby Aleksej Saushev » Sun, 07 Nov 2010 08:35:27 GMT

lex McDonald < XXXX@XXXXX.COM > writes:


...


Why do you choose contraction instead of full form?
Where are references to other programming languages?
Why do you consistently ignore everything around?

You've ignored all experience in other modern programming languages,
there's clear trend _not_ to use contractions: "string=?" (RnRS),
"string=" (CLHS).

Revised^5 Report on Scheme gives names: "string=?" and "string-ci=?"
In addition it suggests reasonable names for order predicates:
string<? string>? string<=? string>=?
string-ci<? string-ci>? string-ci<=? string-ci>=?

Common Lisp HyperSpec gives names "STRING=" and "STRING/=",
"STRING-EQUAL" and "STRING-NOT-EQUAL".
Order predicates are: STRING< STRING> STRING<= STRING>=
STRING-LESSP STRING-GREATERP STRING-NOT-GREATERP STRING-NOT-LESSP

Both propose consistently named set of order and equivalence predicates.


Even if you're not going to address collation problems, it doesn't mean
you're free to do whatever you want. All this brings headache to those
who do try to address it.


"I use it and I want it in standard no matter what."
Where're references to other case insensitive systems?


Problems with your proposal are usual to Forth community.

1. You don't overview existing practice.
2. You ignore already existing practice.
3. You don't provide complete solution.
4. In addition to 3, you don't consider extension possibilities.
5. Solution you propose isn't consistent with (already standardised) "compare"
and "ucompare" (used in some wide-spread systems).


--
HE CE3OH...

Re: RfD: String comparison words version 0

Postby Alex McDonald » Sun, 07 Nov 2010 09:23:09 GMT

n Nov 5, 11:35m, Aleksej Saushev < XXXX@XXXXX.COM > wrote:
> > dictionary in a case-insensitive manner, and many expose these wor>s> > > or their equivalents to the use>.> > > * The commonest use of COMPARE is in the form COMPARE>0>.
> > * They will be implemented more efficiently on many sy>t>ms.
> > Although string manipulation and handling is not employed extens>v>ly,
> > text processing applications benefit signific>nt>y.
>
> Why do you choose contraction instead of full form?

Forth isn't COBOL. There are at least 3 systems to my knowledge that
use one or more of the contracted forms; Win32Forth, MPE's VFX and
gfo>th.

> Where are references to other programming lang>ages?
> Why do you consistently ignore everything around?

I can't find any stricture that other programming languages needed to
be referenced in an RfD, or that Forth needs to reflect their
peculiarities. I'm not quite sure what consistency I'm displaying,
since this is the first RfD I've writ>en>

>
> You've ignored all experience in other modern programming lang>ages,
> there's clear trend _not_ to use contractions: "string=?" (>nRS),
> "string=" (CLHS).

What are RnRs and CLHS? Are they some contractions that refer to
Scheme and Lisp?

Forth has many "contractions"; presumably you disapprove of them too.
Recent inclusions: CFIELD instead of CHARACTER-FIELD. Older examples;
S" instead of STRING" and so on. Again, Forth is not COBOL, nor is it
Scheme or L>sp>

>
> Revised^5 Report on Scheme gives names: "string=?" and "string-ci=?"

I'm not writing the RfD for the next revised Scheme rep>rt.

> In addition it suggests reasonable names for order predi>ates:
< string<> string>< string<=> str>ng>=?
> s<ring-ci? s<ring-ci<=? s>ring-ci>=?

I'm not proposing order predica>es>

>
> Common Lisp HyperSpec gives names "STRING=" and "STRI>G/=",
> "STRING-EQUAL" and "STRING-NOT-E>UAL".
> Order predicates are: STRING> STRING< STRING<> ST>ING>=
> STRING-LESSP STRING-GREATERP STRING-NOT-GREATERP STRING-NOT>LE>SP
>
> Both propose consistently named set of order and equivalence predicates.

Compare to COMPARE. Too l>te>
>> >
> > Why no case-insensitive CO>PA>E>
>
> > While 'a' and 'A' can be considered equal, it is problematic to >s>ign
> > a meaning to a comparison of 'a' against 'B' in terms of 'g>e>ter
> > than' or 'less than'. Numerically, 'B' (65 decimal) is les> >han
> > 'a' (96 decimal), but collation sequences are normally defi>e> in
> > terms of case-insensitive tests; 'A' precedes 'ab', which pr>c>des
> > 'B'. This RfD does not attempt to address these i>su>s>
>
> > Note that the implementation of STR= and ISTR= does not descri>e>the
> > values of c-addr1 or c-addr2 w&g<;en u1 <> u2 (unequal length str>n>s),
> > or when u1=u2=0 (null strings). Given that different implement>t>ons
> > may address these in their own way, supplying invalid values>o> c-
> > addr1 and c-addr2 in those cases (those that would cause an erro> >f a
> > single character was fetched from either of those addresses)>i> an
> > ambigous cond>ti>n.
>
> Even if you're not going to address collation problems, it doesn'> mean
> you're free to do whatever you want. All this brings headache to>t

Re: RfD: String comparison words version 0

Postby George Hubert » Sun, 07 Nov 2010 09:44:30 GMT

n Nov 5, 9:38m, Alex McDonald < XXXX@XXXXX.COM > wrote:
> > dictionary in a case-insensitive manner, and many expose these wor>s> > > or their equivalents to the use>.> > > * The commonest use of COMPARE is in the form COMPARE>0>.
> > * They will be implemented more efficiently on many sy>t>ms.
> > Although string manipulation and handling is not employed extens>v>ly,
> > text processing applications benefit signific>nt>y>
>
> > Why no case-insensitive CO>PA>E>
>
> > While 'a' and 'A' can be considered equal, it is problematic to >s>ign
> > a meaning to a comparison of 'a' against 'B' in terms of 'g>e>ter
> > than' or 'less than'. Numerically, 'B' (65 decimal) is les> >han
> > 'a' (96 decimal), but collation sequences are normally defi>e> in
> > terms of case-insensitive tests; 'A' precedes 'ab', which pr>c>des
> > 'B'. This RfD does not attempt to address these i>su>s>
>
> > Note that the implementation of STR= and ISTR= does not descri>e>the
> > values of c-addr1 or c-addr2 w&g<;en u1 <> u2 (unequal length str>n>s),
> > or when u1=u2=0 (null strings). Given that different implement>t>ons
> > may address these in their own way, supplying invalid values>o> c-
> > addr1 and c-addr2 in those cases (those that would cause an erro> >f a
> > single character was fetched from either of those addresses)>i> an
> > ambigous cond>ti>n>
>
> > Expe>i>nce
> > ---->-->->
>
> > As a case insensitive Forth, Win32Forth exposes ISTR= , used to >e>rch
> > wordlists, as defined>he>e>
> <> >>
>
> For STR= replace the se>te>ce
>
> f the two strings are identical, n i> z>ro.
> >> >ith
>
> f the two strings are identical, n is one (1).- Hide quote> t>xt -
>
> - Show quoted text -

Shouldn't it be TRUE (i.e. -1) for a match, which means the stack
comment should be f not n for the result.

George Hubert

Re: RfD: String comparison words version 0

Postby Elizabeth D Rather » Sun, 07 Nov 2010 10:33:57 GMT




...

 From a practical POV, it's usually more useful for a failed compare to 
return TRUE, so it can be used before an IF that is going to try 
something else, THROW, or whatever.  The code is more straightforward 
that way:

: foo ( stuff -- more stuff)   <get strings>
    <compare> ABORT" No match!
    <continue normal processing> ;

Cheers,
Elizabeth

-- 
==================================================
Elizabeth D. Rather   (US & Canada)   800-55-FORTH
FORTH Inc.                         +1 310.999.6784
5959 West Century Blvd. Suite 700
Los Angeles, CA 90045
 http://www.**--****.com/ 

"Forth-based products and Services for real-time
applications since 1973."
==================================================

Re: RfD: String comparison words version 0

Postby BruceMcF » Sun, 07 Nov 2010 14:55:28 GMT



>
> returns 1 if TEST consists of all the same character. Should that be>
> spelled out in the RfD?

It doesn't need to be spelled out.

How is ICOMPARE specified in implementations where it exists? With
ICOMPARE, the equalities are a chain of interfaces:

: STR= ( ca1 u1 ca2 u2 -- flag ) COMPARE 0= ;
: ISTR= ( ca1 u1 ca2 u2 -- flag ) ICOMPARE 0= ;

: S= ( ca1 ca2 u -- flag ) TUCK STR= ;
\ "Strings Equal" to length u

: IS= ( ca1 ca2 u -- flag ) TUCK ISTR= ;
\ Case "Insensitive String Equal", to length u

: SS= ( ca1 u1 ca2 u2 ) ROT UMIN S= ;
\ "Substrings equal" to length of the shortest

: ISS= ( ca1 u1 ca2 u2 ) ROT UMIN IS= ;
\ Case "Insensitive substrings equal to length of shortest

Re: RfD: String comparison words version 0

Postby Alex McDonald » Sun, 07 Nov 2010 23:38:08 GMT




> >> If the two strings are identical, n is zero.>
>>
>>>> with>
>>
>>>> If the two strings are identical, n is one (1).- Hide quoted text>-
>>>>> >> - Show quoted text>-
>>>
> > Shouldn't it be TRUE (i.e. -1) for a match, which means the sta>k>
> > comment should be f not n for the resul>.
>>
> rom a practical POV, it's usually more useful for a failed compare >o
> return TRUE, so it can be used before an IF that is going to t>y
> something else, THROW, or whatever. he code is more straightforwa>d
> that wa>:
>>
> : foo ( stuff -- more stuf<) 
<> <>ompare> ABORT" No m>tch<
>  ;
>
> >heers,
> El>za>eth
> --
> ===========================================>======
> Elizabeth D. Rather (US & Canada) 00-55-FORTHbegin_of_the_skype_highlighting800-55-FORTHend_of_the_skype_hig>lighting
> FORTH Inc. 1 310.999.6784begin_of_the_skype_highlighting+1 310.999.6784end_of_t>e_skype_highlighting
> 5959 West Ce>tury Blvd. Suite 700
> Los Angeles, CA 9004>ht>p://www.forth.com
>
> "Forth-based products and S>rvices for real-time
> appl>cations since 1973."
> ==================================================

My bad, it should be -1; I'll correct in the next version.

Re: RfD: String comparison words version 0

Postby Alex McDonald » Sun, 07 Nov 2010 23:51:20 GMT




> > CREATE TEST 65 c, 65 c, 65 c, 65 c, \ "BBBB"
>>>> TEST 3 TEST 1+ 3 STR=>
> > returns 1 if TEST consists of all the same character. Should that be>
> > spelled out in the RfD?>
>>
> It doesn't need to be spelled out.>
>>
> How is ICOMPARE specified in implementations where it exists? With>
> ICOMPARE, the equalities are a chain of interfaces:>
>>
> : STR= ( ca1 u1 ca2 u2 -- flag ) COMPARE 0= ;>
> : ISTR= ( ca1 u1 ca2 u2 -- flag ) ICOMPARE 0= ;

I don't think ICOMPARE can be satisfactorily defined due to the issue
of collating sequence. While equality for 'A'='a' etc can be easily
defined, should the string "B" be greater than "ab"? Numerically, "B"
is less than "a".

>
> : S= ( ca1 ca2 u -- flag ) TUCK STR= ;>
> \ "Strings Equal" to length u>
>>
> : IS= ( ca1 ca2 u -- flag ) TUCK ISTR= ;>
> \ Case "Insensitive String Equal", to length u

I think these two are in MPE's VFX. I seem to remember a note from
Anton preferring S= ( ca1 u1 ca2 u2 -- flag ) over STR=, but he noted
that VFX had already nabbed it.
>
>>
> : SS= ( ca1 u1 ca2 u2 ) ROT UMIN S= ;>
> \ "Substrings equal" to length of the shortest>
>>
> : ISS= ( ca1 u1 ca2 u2 ) ROT UMIN IS= ;>
> \ Case "Insensitive substrings equal to length of shortest

Re: RfD: String comparison words version 0

Postby BruceMcF » Mon, 08 Nov 2010 01:04:04 GMT



Its straightforward, its just that there are two straightforward
answers, and picking one is mostly a coordination problem.

Either treat all upper case characters as lower case and COMPARE
Or treat all lower case characters as upper case and COMPARE

The existing practice that I know of is the Toolbox which relies on
copying the string and converting to lower case, so it would be the
first version.

When using COMPARE for sorting, you want ICOMPARE to sort strings as:

BRMCF
Bruce McF
brucemcf
BruceRMcF

.... and to treat strings that differ only in case as equal.

Re: RfD: String comparison words version 0

Postby Alex McDonald » Mon, 08 Nov 2010 01:23:48 GMT





Thanks, I realise I've been rather brain dead on the subject. Agreed.
I'll change the RfD to reflect this.

But this brings me to a point that I think is worth making; the
factoring of COMPARE and ICOMPARE are backwards, in the sense that the
fundamental word is over specified, and all the derivatives under
specify to get the desired result; case in point : STR= COMPARE 0= ;

That's not the normal Forth way of doing things, where smaller words
build bigger, but rather this is a big overfunctional word that's used
to make smaller.

Re: RfD: String comparison words version 0

Postby Aleksej Saushev » Mon, 08 Nov 2010 03:44:28 GMT

lex McDonald < XXXX@XXXXX.COM > writes:

>> > * They will be implemented more efficiently on many sy>>e>s.
>> > Although string manipulation and handling is not employed extens>>e>y,
>> > text processing applications benefit signific>>tl>>
>>
>> Why do you choose contraction instead of full>fo>m?
>
> Forth isn't COBOL. There are at least 3 systems to my knowledg> that
> use one or more of the contracted forms; Win32Forth, MPE's V>X and
> gforth.

You're trying to make Forth look like Perl, which is worse than COBOL.
There're reasons why people try not to use contractions when they design
languages. Repeating mistakes of past and enforcing these mistakes on
new implementations isn't g>>d.

>> Where are references to other programming lang>>ges?
>> Why do you consistently ignore everything a>ou>d?
>
> I can't find any stricture that other programming languages nee>ed to
> be referenced in an RfD, or that Forth needs to reflect>their
> peculiarities. I'm not quite sure what consistency I'm displ>ying,
> since this is the first RfD I've written.

Forth is programming language still, and subject to general rules on
programming languages. Here you're repeating the same mistake again,
you design part of more or less general interface without considering
consequences of this neglect>>n.

>> You've ignored all experience in other modern programming lang>>ges,
>> there's clear trend _not_ to use contractions: "string=?" (>>RS),
>> "string=" (>LH>).
>
> What are RnRs and CLHS? Are they some contractions that re>er to
> Scheme and Lisp?

The explanation is right above: Revised^n Report on Scheme, Common Lisp
HyperSpec. That you don't know it makes you bad language desig>er.

> Forth has many "contractions"; presumably you disapprove of the> too.
> Recent inclusions: CFIELD instead of CHARACTER-FIELD. Older exa>ples;
> S" instead of STRING" and so on. Again, Forth is not COBOL, nor>is it
> Scheme or Lisp.

Forth _must_ take Scheme and Lisp experience into account. Otherwise it
is doomed to repeat same mistakes, or worse, the very mistakes Scheme
and Common Lisp fixed when they were designed.

In particular, "cfield" is example of bad name. Try to read the story
around "nconc" in Lisp. Reading about "caddddr" is useful >>o.

>> Revised^5 Report on Scheme gives names: "string=?" and "string>ci>?"
>
> I'm not writing the RfD for the next revised Scheme report.

You're writing RfD. If you want the result look serious, you ought to
know what you're doing. There're not so many languages with multiple
implementations, ignoring their experience is another case of Hugh
syndrome. Perhaps you should read more about programming languages
before you start "improving" anything, otherwise you'll get into
situation like Hugh with his balanced tr>>s.

>> In addition it suggests reasonable names for order predi>>tes:
>< string<> string>< string<=> str>>g>=?
>> s<ring-ci? s<ring-ci<=? s>ring>ci>=?
>
> I'm not proposing order predicates.

And this is major defect in your >>D.

>> Common Lisp HyperSpec gives names "STRING=" and "STRI>>/=",
>> "STRING-EQUAL" and "STRING-NOT-E>>AL".
>> Order predicates are: STRING> STRING< STRING<> ST>>NG>=
>> STRING-LESSP STRING-GREATERP STRING-NOT-

Re: RfD: String comparison words version 0

Postby BruceMcF » Mon, 08 Nov 2010 04:58:50 GMT






But I did not specify STR< S< STR> S> ... then the COMPARE 0= is no
longer in there.


* A string greater/less than comparison word *is required*, either
two, a greater than and a less than, or a single with a tri-state
return. There is *no substantial difference between the two in their
inner loop*, and no difference in the test on unequal, with the only
difference being return flags. So the tri-state return is the more
efficient, requiring that loop once rather than twice.

* A string equality / inequality comparison word is required, either
one, and once the tri-state greater/equal/less word is available,
string equality exists for for little cost in terms of code-size

* If code-size is no issue and a hardware STR= or S= is available that
is substantially more efficient than the comparison, or if a hardware
S< S= is available so COMPARE is not the character by character loop
but built from string level operations, then use the S= and build STR=
from that.

Now, *in* typical Forth fashion, if there is an opportunity to take
advantage of a hardware operating speed efficiency available in some
processor that offers it, an opportunity to take advantage of a built
in S/Z flag hardware COMPARE, and a code-space efficient platform for
string comparison in small implementations without the hardware string
function support and/or that do not want to devote a lot of kernel
size to string comparison.

Re: RfD: String comparison words version 0

Postby BruceMcF » Mon, 08 Nov 2010 05:09:47 GMT



Or more concisely, just like 0= 0< and - suffice to generate 0<> 0> =
< > => =< and <> ... but 0= and - alone only support 0<> = and <> ....
COMPARE, which overlays STR= and STR<, suffices to generate a complete
set of comparisons, while STR= alone only support a subset.

However, it would be a handy thing to have standard names for that
set, to be able to test whether or not they are provided and use the
built in version when available rather than having to build up from
COMPARE from scratch every time.

Similar Threads:

1.RfD: String comparison words version 1 (was version 0)

2.RfD: String comparison words version 1

Alex McDonald < XXXX@XXXXX.COM > writes:

> RfD: String comparison words (Draft version 1)
>
> Change history
> 2010-11-05 Initial proposal, incomplete
> 2010-11-22 Expanded Remarks section
>            Expanded Experience section
>            Correction of errors
>
> Problem
> -------
>
> Although ANS Forth provides COMPARE for string comparisons, it has two
> attributes that make it problematic; it is case sensitive, and
> expensive to execute for equality or inequality (the common cases) due
> to the need to complete the comparison to return greater-than or less-
> than return values.
>
> Add to the existing functionality of COMPARE to provide comparisons
> that are case-insensitive and and that only test for equality.
>
> STR= ( c-addr1 u1 c-addr2 u2 -- flag )  STRING-EXT

I still don't like standard words being contractions when there's no
problem using full form. "STRING=" is much better by all means except
length, and it is only 3 letters longer. You don't know if programmer
wants to use "STR" prefix for dynamic strings or not.

> Compare the string specified by c-addr1 u1 to the string specified by
> c-addr2 u2. c-addr1 and c-addr2 point at read-only areas, which must
> not be modified. If the strings are of different lengths (u1<>u2),
> flag is FALSE. Otherwise the strings are compared, beginning at the
> given addresses, character by character, up to the equal length of the
> strings or until a difference is found. If the two strings are
> identical, flag is TRUE.

"Identical" or "equal"?

"Identical" means that objects are the same, "equal" means that objects
are allowed to be different still equal. Look up "object identity"
in programming literature.

> ISTR= ( c-addr1 u1 c-addr2 u2 -- flag )  STRING-EXT
>
> Compare the string specified by c-addr1 u1 to the string specified by
> c-addr2 u2. c-addr1 and c-addr2 point at read-only areas, which must
> not be modified. If the strings are of different lengths, flag is
> FALSE. Otherwise the strings are fetched, beginning at the given
> addresses, character by character. Characters are considered to match
> if they have the same numeric value, or, if the characters fall
> between ASCII values 'A' thru 'Z', they are considered to be identical
> to the corresponding character values in the range 'a' thru 'z'.

This is unacceptible since it ignores natural string comparison rules.

> If
> the two strings are identical, flag is TRUE, otherwise FALSE.

Same as above.

> Remarks
> -------
>
> Why standardize these words? They can be defined in ANS Forth, for
> example;
>
> : STR= COMPARE 0= ;
>
> For the following reasons:
>
> Most uses of COMPARE are for string equality or inequality for string
> prefices.
>
> Using COMPARE to test for inequality is inefficient, as strings with
> unequal lengths can immediately be declared unequal; but COMPARE must
> continue to fetch and check characters to determine whether it should
> return greater-than or less-than, even though the result of this
> additional work will be discarded.
>
> For strings of equal length, the overhead is less significant, but the
> result of the comparison must still be adjusted to indicate the
> required result.

These are very weak arguments since all these cases are eliminated with
primitive peephole optimiser.

> Although string manipulation and handling is not employed extensively,
> text processing applications benefit significantly. Letting the
> compiler optimize uses of COMPARE 0= into a more efficient word is
> possible, but the programmer must employ an expensive COMPARE followed
> by tests to reduce the range of the result on systems that do not
> synthesize more efficient tests for equality.
>
> Case insensitive Forths require words to search the dictionary in a
> case-insensitive manner. These tests and tests for prefixes require
> that the tested argument is either converted to all upper case (or all
> lower case), which generally requires copying the original string to a
> transient area and performing a suitable case translation, followed by
> an expensive COMPARE operation.
>
> Why no case-insensitive COMPARE?
>
> There are a wide variety of case-insensitive words employed by Forths
> for this function; ICOMPARE, COMPARE(NC), UCOMPARE amongst others.
> Standardising such widely varying words would be problematic.

Standardising STRING= STR= S= $= and so on is still problematic, yet
you're writing this proposal somehow.

Standardising case insensitive COMPARE would solve remaining string
comparison problems.

> Why no STR<, STR>, STR>= and so on?
>
> The implementation of any test beyond equality requires inspecting all
> the characters for the length of the shortest.

Proper implementation of equality test for strings requires inspecting
all the characters anyway.

> The differentiation
> between greater-than and less-than is trivial for implementations of
> COMPARE to determine, as it is set on meeting the first non-equal
> character, or on exhausting one or other of the strings. All of these
> variants can be efficently written using COMPARE.
>
> : STR< COMPARE 0 > ;
> : STR> COMPARE 0 < ;
> : STR>= COMPARE 1 < ;
>
> and so on.
>
> The current proposal does not allow the synthesizing of case
> insensitive comparisons due to a lack of appropriate ICOMPARE (or
> COMPARE(NC) etc).

And this is major drawback of this proposal.

> Experience
> ----------
>
> As a case insensitive Forth, Win32Forth exposes ISTR= , used to search
> wordlists, as defined here, and supplies a STR= not based on COMPARE.
>
> MPE's VFX Forth supplies STR= S= and IS=. S= is a buffer compare with
> the signature ( c-addr1 c-addr2 u -- flag ); IS= is the case
> insensitive equivalent. S= and IS= can be efficiently synthesized from
> STR= and ISTR= respectively;
>
> : S= ( c-addr1 c-addr2 u -- flag ) TUCK STR= ;
> : IS= ( c-addr1 c-addr2 u -- flag ) TUCK ISTR= ;
>
> [ Does VFX Forth provide an equivalent to ISTR=? ]
>
> Gforth supplies STR= STR< and STRING-PREFIX?.
>
> STRING-PREFIX? can be synthesized from STR= ;
>
> : STRING-PREFIX? ( c-addr1 u1 c-addr2 u2 -- flag )
>   TUCK 2>R MIN 2R> STR= ;
>
> [ Information on other Forths required here ]

Create it. There're more freely available Forths than Win32Forth and Gforth,
some of them are portable.

> Comments
> --------
>
> The ANS definition of COMPARE does not explicitly declare whether the
> input strings are read-only. Since COMPARE states that characters are
> "compared", the assumption is that they are read-only since no
> reasonable implementation needs to employ a destructive test.

This makes grounds to review the practice and amend standard to require
non-destructive comparison.

> With
> case-sensitive string comparisons, this RfD makes it clear that they
> are read-only, as implementors might be tempted to lower- or upper-
> case one or both of the strings prior to comparison.
>
> Note that the implementations do not assign a meaning to the values of
> c-addr1 or c-addr2 when u1<>u2 (unequal length strings), or when
> u1=u2=0 (null strings which always return TRUE). Given that different
> implementations may address these in their own way, supplying invalid
> values of c-addr1 and c-addr2 in those cases (those that would cause
> an error if a single character was fetched from either of those
> addresses) is an ambiguous condition.
>
> Case-insensitivity only considers ASCII 'A' thru 'Z' to be equal to
> the corresponding ASCII characters 'a' thru 'z'. No other characters
> outside that range are considered equal.

This is definitly wrong. If your words are not useful for anything except
internal problems of your implementation and your programs, they should
not be standardised at all, let alone take useful names.


-- 
HE CE3OH...

3.RfD: String comparison words version 0

RfD: String comparison words

Change history
2010-11-05 Initial proposal, incomplete

Problem
-------

Although ANS Forth provides COMPARE for string comparisons, it has two
attributes that make it problematic; it is case sensitive and
expensive to execute due to the range of return values. Extend the
functionality of COMPARE to provide comparisons that are case-
insensitive, and comparisons that test for only equality.

STR= ( c-addr1 u1 c-addr2 u2 -- n )  STRING-EXT

Compare the string specified by c-addr1 u1 to the string specified by
c-addr2 u2. c-addr1 and c-addr2 point at read-only areas, which must
not be modified. If the strings are of different lengths (u1 is not
equal to u2), n is zero (0). Otherwise the strings are compared,
beginning at the given addresses, character by character, up to the
equal length of the strings or until a difference is found. Characters
are considered identical if they have the same numeric value. If the
two strings are identical, n is zero.

ISTR= ( c-addr1 u1 c-addr2 u2 -- n )  STRING-EXT

Compare the string specified by c-addr1 u1 to the string specified by
c-addr2 u2. c-addr1 and c-addr2 point at read-only areas, which must
not be modified. If the strings are of different lengths, n is zero
(0). If both strings are null (u1 and u2 are both zero), n is one (1).
Otherwise the strings are fetched, beginning at the given addresses,
character by character. Characters are considered identical if they
have the same numeric value, or if the characters fall between ASCII
values 'A' thru 'Z' they are considered to be identical to the
corresponding character values in the range 'a' thru 'z'. If the two
strings are identical, n is one (1); otherwise n is zero (0).

Remarks
-------

Why standardize these words? They can be defined in ANS Forth;

: STR= COMPARE 0= ;
: ISTR= <definition required> ;

For the following reasons:

    * Many systems define STR= or ISTR= (or the equiavlent with
another name)
    * Case insensitive Forths require such a word to search the
dictionary in a case-insensitive manner, and many expose these words
or their equivalents to the user.
    * The commonest use of COMPARE is in the form COMPARE 0=.
    * They will be implemented more efficiently on many systems.
Although string manipulation and handling is not employed extensively,
text processing applications benefit significantly.

Why no case-insensitive COMPARE?

While 'a' and 'A' can be considered equal, it is problematic to assign
a meaning to a comparison of 'a' against 'B' in terms of 'greater
than' or 'less than'. Numerically, 'B' (65 decimal) is less than
'a' (96 decimal), but collation sequences are normally defined in
terms of case-insensitive tests; 'A' precedes 'ab', which precedes
'B'. This RfD does not attempt to address these issues.

Note that the implementation of STR= and ISTR= does not describe the
values of c-addr1 or c-addr2 when u1 <> u2 (unequal length strings),
or when u1=u2=0 (null strings). Given that different implementations
may address these in their own way, supplying invalid values of c-
addr1 and c-addr2 in those cases (those that would cause an error if a
single character was fetched from either of those addresses) is an
ambigous condition.

Experience
----------

As a case insensitive Forth, Win32Forth exposes ISTR= , used to search
wordlists, as defined here.

<others>

4.RfD: Escaped Strings (Version 6.2)

5.RfD: Escaped Strings version 4

6. RfD: Escaped Strings (Version 6)

7. RfD: Escaped Strings S\" (version 5)

8. Strawpoll: better name for PARSE-WORD (was: RfD: PARSE-WORD)



Return to forth

 

Who is online

Users browsing this forum: No registered users and 66 guest