I have coded the Sieve of Eratosthenes and counted 64 bytes of code. I know this is not a all encompassing benchmark, but it is a data point. Does anyone have numbers I can compare to? I am looking for any machine, any tool. Rick
I have coded the Sieve of Eratosthenes and counted 64 bytes of code. I know this is not a all encompassing benchmark, but it is a data point. Does anyone have numbers I can compare to? I am looking for any machine, any tool. Rick
bc: for(i=2;i<999;i++){if(!q[j=i]){i;while(j+=i<999)q[j]=1;}} 57 characters + 1 return. I guess it's possible to make it shorter. Regards, -Helmar
Thanks for the reply... I think. I wasn't talking about source code. I was talking about executable. If you are going to quote an interpreted language, then you need to include the size of the interpreter, no? My goal is to compare the size of an executable written in the assembly language of a MISC type CPU I am working on. I'd like to get an idea of the degree of code size optimization I've achieved. Since it is targeted to small FPGAs that was a main goal and I would like to evaluate it. Any other ideas on reasonable benchmark programs for embedded processors? I don't have application code yet, so I can't use that. Rick
OK, I could see what could be done with a DOS .COM-file. I guess it can be very small too. Mhm. Good question. You do not include the size of the processors microcode (if it has) or so. So probably the generated VM-code of an interpreter could be the messure? I'm unsure, it's not always possible to figure this out. You want to use the sieve as benchmark or for comparing the size of generated code? I guess it's suitable for both, but in case of a benchmark it strongly depends what "shortcuts" you do while sieving. Regards, -Helmar
OK, with DOS and Fasm, 77 Bytes including output routines and other overhead: --------------------- org 100h mov si,buf mov cx,1000 xor ax,ax lea di,[si - 1] push di rep stosb mov bx,di sieve: pop di xor ax,ax lodsb push di dec ax jns .a pusha sub di,si mov ax,di neg ax mov cx,1 ..da: cwd div word [base] push dx inc cx or ax,ax jnz .da push word -16 ..db: pop dx add dl,"0" mov ah,2 int 21h loop .db ..b: mov byte[si-1],1 sub si,di cmp si,bx jna .b popa ..a: cmp si,bx jc sieve int 20h base: dw 10 buf = $+1 --------------------- I guess it can be made shorter. Regards, -Helmar
Depends on what you're measuring. In a recent project at work, we used Lua and we most certainly did count the cost of the interpreter and libraries (about 120k). But that doesn't tell the whole story-- the application is composed of eleven concurrent Lua processes and in our case, interpreter and it's libraries are shared. So if you look at it from a process level, you would count the interpreter and libraries eleven times. But if you looked at it from system level, there is only one copy of the interpreter and libraries. This doesn't make sense to me. I can see starting from some point, refining your instruction set, and then comparing that. But how can you meaningfully compare assembly language size without putting some parameters on this? For starters, what is the native machine word size? Do you have hardware multiply and divide? What kind of address space do you have in mind? I'm not sure the base premise here is valid, and I can't see how one can measure "reasonable." Give me more details about your processor and instruction set and I'm sure I can design benchmarks that can make your design look good or bad. Further, it seems to me that if you're targeting a FPGA, then what is likely going to matter more than artificial benchmarks is the ability to be flexible with the design. That is, your processor is parameterizable and that you have in mind an extension architecture. At work, we're currently on a project that uses a MicroBlaze soft processor. When we started, we simply instantiated a fairly simple processor. As we started to dive into our application code, we found that we could benefit from a FPU-- so we added it. Now, we're finding that the software routines to do bitblt and compositing with an alpha channel are kinda slow. So we're dropping in some support for that. In other words, having a soft processor that we can extend means that we can bend the processor to the needs of the application. It's not a panacea, but it certainly does change the design process in a way that we find rather mirrors Forth's interactive nature. In Forth, you can build up the language to support the needs of your application, tuning it as needed. With soft processors on FPGAs, we can (effectively) do the same thing at the hardware level. If anything, that's where I think the e{*filter*}ment is going in the future in embedded systems: being able to instantiate customized hardware. I have no idea what your design goals are with your processor, but it's something to consider.
There are always clever hacks one can do. I remember last year, someone in comp.lang.forth used some number theory to optimize a sieve in a non-obvious way. That was more for speed than size, but I'm sure someone with too much free time on their hands could have some insight that reduces size even more. But, I don't think that's useful in this case. Since his goal is to in some sense benchmark a processor's instruction set, I would think the goal would be not to come up with the smallest or most clever implementation, but to fix on a particular algorithm and implement that specific algorithm in the same exact way and then compare the size of the code generated.
There are no questions more hurting than questions that are not answered. He wants the shortest thing to compare. This is interesting indeed. If I remember right and assign this to the right topic, it was not that "non-obvious" as you might think. That size reduction games have some real good reasons. You learn the specific assembly language very well for the first of all. He did not give an implementation. So I first even was thinking he talks about source code too. Well. Btw. I've found 4 more bytes to remove in the DOS-implementation of the sieve. It's good to fill the brain to do such things, when other things (unwanted most probably) try to become {*filter*} for your thinking. I currently wait for something, so I do something to spend my time. I personally think it takes longer time to figure out why something in a small program is like it is, than to invent it ;) So for me it's fun always. Regards, -Helmar
In what sense? As an intellectual exercise, sure. But I thought the point here was that he wanted to have some kind of meaningful comparison of instruction set sizes, not comparing the cleverness of programmers. Perhaps. I forget that everyone here is a genius. Sure, but how does it help compare the size of instruction sets which again was what I thought started this thread?
It's fair to count whatever runtime support is required. But for embedded Forths, it's common to keep the compiler/interpreter etc. in a host, and have a minimal run-time requirement in the target. SwiftX targets typically have about 6K of support code, but even that is a "representative set of capabilities" and can be pruned if space is an issue. My embedded systems programming course has a standard course problem involving the coding of traffic lights for a particular intersection. I give a prize for the smallest compiled target that runs correctly including all support code, and a separate prize for the most readable source. They are never the same, because the extreme measures that students go through to get the size prize adversely impact readibility; the takeaway lesson is that you have tradeoffs, and need to make choices. Anyway, typical winning sizes are about 800 bytes on a 16-bit target, and about 1100 on a 32-bit target, for the total executable package (including stripped kernels). What instruction set are you compiling to? The one in your target? Measurements on existing CPUs are unlikely to be relevant. You're unlikely to learn much from the Sieve, because it's pretty unrepresentative of embedded-type applications. Suggestion: Get evaluation versions of cross-compilers from FORTH, Inc. and MPE. They probably come with example applications (I know SwiftX does). Look at those on an architecture as close as possible to the architecture you have in mind. Do dumps and SEEs and look at the code density. Cheers, Elizabeth -- ================================================== Elizabeth D. Rather (US & Canada) 800-55-FORTH FORTH Inc. +1 310.999.6784 5959 West Century Blvd. Suite 700 Los Angeles, CA 90045 http://www.**--****.com/ "Forth-based products and Services for real-time applications since 1973." ==================================================
Interesting. 46 instructions, 134 bytes on an MSP430. No startup code, just the (almost) original Sieve. Gary
Thanks for the reply. This is the sort of thing I was looking for. No startup or printout code, just the raw code to do the computation which is how this CPU will be used. The algorithm I used was the one from J. Gilbreath, Byte Magazine, 9/81. This algorithm assumes the multiples of two do not need to be searched. Was your code in forth, C or other? Is this the algorithm you used or did it search the entire field? Here is the code I manually converted to pseudo Forth assembly language for my machine. My code calls FILL, but otherwise is just machine code. I guess I should count the bytes for FILL. I think that is less than 10 bytes of code. 8190 CONSTANT SIZE CREATE FLAGS SIZE ALLOT : DO-PRIME ( -- n ) FLAGS SIZE -1 FILL 0 SIZE 0 DO I FLAGS + C@ IF I 2* 3 + DUP I + BEGIN DUP SIZE < WHILE DUP FLAGS + 0 SWAP C! OVER + REPEAT 2DROP 1+ THEN LOOP ; : $SIEVE$ ( -- ) BEGIN [$ DO-PRIME SIZE $] UNTIL ." Eratosthenes sieve " . ." Primes" ;
Yes, I guess that is true. Even though I designed my instruction set, there are still things I learn by writing code. I will also say that the instruction set is not fixed. I am trying to figure out the best balance between speed, code size and implementation size in the FPGA. Initially I picked an instruction set and optimized things like the coding for the machine code to minimize the decode. But I see now I need to do more benchmarking before I should start optimizing the machine code and the HDL. Just now I posted the code in Forth above. I just figured everyone would have this code since it seems to be the "classic" example to use in a sieve benchmark in Forth. I also enjoy this. I don't yet have a real application for this CPU, but I may need to convert an HDL design to use a CPU in order to save gates. Most of the logic is very slow with stages running at 8 kHz, 1 kHz and 100 Hz. I can do it all in software with hardware interfaces to the outside. But the 8 kHz stage might be done in hardware still so that it can be speeded up to 80+ kHz for a similar application with a higher data rate.
In article < XXXX@XXXXX.COM >, There was an example on the VAX. You read it, and then suddenly the code was over, leaving you wondering "where did it happen?". It was extremely short. It had something to do with a single instruction for flipping a bit in an array. Now this latter could have something todo with a decision to put an instruction in microcode, that would help to do precisely this kind of benchmarks. So no, I don't think you can draw much conclusions from this. If you choose one benchmark, then you must choose very carefully. Elizabeth Rathers' one, traffic lights, seem vastly superior. Better still are the language shoot-outs, with an array of benchmarks. Remarkable about these shout-out's is that most of the time, some benchmarks are not implemented for a particular language because it is too much trouble. This, if anything, shows that no language is good for everything. Groetjes Albert.
1.What's Scheme reference point?
2.passing floating point variable by reference
Hello. When I am passing regular long by reference to a function I do .type somefunc, @function somefunc: pushl %ebp movl %esp, %ebp and I can use passed variable like movl 8(%ebp), %eax movl (%eax), %eax Now I have that variable in %eax register and can use or change it. In order to store it back I just do movl 8(%ebp), %edx movl %eax, (%edx) Now, I want to do something like that with floating point. Let's suppose, I have a function, with some float variable, and I want to call another function, and send pointer to my float to it. Now, when I receive that floating point variable address at let's say 8(%ebp), I can store it in regular register like movl 8(%ebp), %eax But I cannot do fldl %eax How to push to the fp stack variable by knowing only it's address? Thank you very much
3.Could a struct with size 44 bytes point always points to a char array with size 2048 bytes?
For example: the msg = temp_buf; is alwawys ok? //test_msg.cpp struct msg_head { char a01[4]; char a02[4]; char a03[4]; char a04[4]; char a05[4]; char a06[4]; char a07[4]; char a08[4]; char a09[4]; char a10[4]; char a11[4]; }; int main() { struct msg_head * msg; char temp_buf[2048]; ... msg = temp_buf; ... }
4.Need to make a reference grid for a letter size page
I am trying to generate a TIFF file with the following parameters: - sampled at 600dpi - letter size (8 1/2 x 1", or 5100 x 6600 pixels) - grid lines every 1000 units The procedure I am using is to run the PS file below through Distiller, converting it into PDF and then export the grid as a TIFF file from Acrobat. The first problem is that the resulting TIFF file has a dimension 1700x2200, and the resolution is 200dpi. I can change the resolution with IrfanView, but not the dimensions, because the resampling will mess things up. I guess I need to change the default (72dpi) resolution of PostScript to 600dpi? How? TIA, -RFH ---------- newpath % 5 Vertical Lines 120 0 moveto 0 792 rlineto 240 0 moveto 0 792 rlineto 360 0 moveto 0 792 rlineto 480 0 moveto 0 792 rlineto 600 0 moveto 0 792 rlineto % 6 Horizontal Lines 0 672 moveto 612 0 rlineto 0 552 moveto 612 0 rlineto 0 432 moveto 612 0 rlineto 0 312 moveto 612 0 rlineto 0 192 moveto 612 0 rlineto 0 72 moveto 612 0 rlineto % end of grid 1 setlinewidth stroke showpage
5.Minimum sizes of integral and floating point types
About C95. Is there any mentioning in the standard about the number of usable bits of the various built in types, apart from char/signed char/unsigned char types? Or only about the minimum value ranges of them?
6. Need to know the size of the memory block pointed to by a char*
Users browsing this forum: No registered users and 40 guest