Jump to content

SweetCX16


Recommended Posts

On 1/10/2022 at 5:50 PM, BruceMcF said:

There ought to be differences throughout ... it is not attempting to be a port of Woz's code, it is attempting to be an open source VM that executes Sweet16 source, and beyond that is explicitly focusing on using a faster approach routine dispatch.

And in part it is explicitly pursuing a different speed / codesize tradeoff than Woz's Sweet16 because I doubt I could pursue Woz's specific goals and do any better than he did.

Ed and I were comparing original source to original "source"... turns out SWEET16 itself either has variations or typos in the wild.

I get that yours is an open-source implementation of the API.  It's also more likely to work for me than the original.

  • Like 1
Link to comment
Share on other sites

On 1/10/2022 at 5:50 PM, BruceMcF said:

There ought to be differences throughout ... it is not attempting to be a port of Woz's code, it is attempting to be an open source VM that executes Sweet16 source, and beyond that is explicitly focusing on using a faster approach routine dispatch.

And in part it is explicitly pursuing a different speed / codesize tradeoff than Woz's Sweet16 because I doubt I could pursue Woz's specific goals and do any better than he did.

 

Given your suggestions about better use of SWEET16 -- i.e. as a setup rather than something for inner loops -- maybe there's not much benefit to using something like it to replace bits of the KERNAL.  Except for KERNAL routines used for system initialization, of course.

 

Link to comment
Share on other sites

On 1/12/2022 at 10:01 AM, rje said:

Given your suggestions about better use of SWEET16 -- i.e. as a setup rather than something for inner loops -- maybe there's not much benefit to using something like it to replace bits of the KERNAL.  Except for KERNAL routines used for system initialization, of course.

Yes, that's the basic idea ... routines used for system initialization. The original Sweet16 saved more space in the Apple ROM(s?) than Sweet16 used, so it was in a "free" resource in a space consumption sense, at a time when ROM cost much more per KB than it does today.

In the context of the X16, the most appealing aspect may be the ability to conserve on relatively scarce Low RAM if a Sweet16 VM is available.

Edited by BruceMcF
  • Like 1
Link to comment
Share on other sites

On 1/12/2022 at 3:23 PM, BruceMcF said:

Yes, that's the basic idea ... routines used for system initialization. The original Sweet16 saved more space in the Apple ROM(s?) than Sweet16 used, so it was in a "free" resource in a space consumption sense, at a time when ROM cost much more per KB than it does today.

In the context of the X16, the most appealing aspect may be the ability to conserve on relatively scarce Low RAM if a Sweet16 VM is available.

I'm still thinking that a 16K ROM bank could benefit from Sweet16... assuming that you only want to work with that 16K, and assuming you've got more stuff to put in that 16K than you would normally be able to squeeze into it.

  • Like 1
Link to comment
Share on other sites

Apple's garage computer was a piece of wizardry. Woz made an amazing machine with what he had. Jobs made the world eat it.

I was picked out to help my high school select a system for our first computer lab back in the early 80's (I was one of a few kids in school who already owned a Home Computer). Tandy vs Apple vs Commodore. I picked the Pet as it was a serious machine (i've seen all three in action and specs); the school (as did many) ate the Apple. Because of educational discounts to indoctrinate the youth to continue to eat the Apple. We still see the effects to this day.

Apple = Eye candy. Get a real system.

The 6809 was a wonderful processor that I have worked with. With the (indirectly with the SuperPet SP9000) and with coding the chip directly. I wish it took off. Really.

Link to comment
Share on other sites

On 1/13/2022 at 10:43 PM, codewar65 said:

The 6809 was a wonderful processor that I have worked with. With the (indirectly with the SuperPet SP9000) and with coding the chip directly. I wish it took off. Really.

My computer knowledge really took off in grade 12 when the typing teacher removed me from the regular class (I think my knowledge and ability really challenged her.) I was given a SuperPet, disk drive and the complete set of Waterloo languages with manuals  to work in another room (so I wouldn't be disruptive to the main class or teacher.)  Structured Basic, Pascal and Fortran were real joys, APL was just neat but my background math knowledge was lacking. Assembler on the 6809 and 6502 interested me; I should have done more. Cobol didn't make sense then (or now) - that experience suggested I avoid working it for Y2K. 

From reading about the evolution of the Waterloo languages, there must have been a couple of engineering/CS students who had exposure at the university level. Hopefully they benefited from the experience.  I did!

Edited by Edmond D
  • Like 1
Link to comment
Share on other sites

On 1/13/2022 at 10:43 PM, codewar65 said:

the school (as did many) ate the Apple.

In Ontario (a large, rich province in Canada), a couple of PETS were in the math classrooms in the early 80 when I was in grade 7. In New Brunswick (a smaller economically challenges province), I got to use PETS from 1984-1987.  The main lab had several pets chained to one disk drive - most likely the cheapest option. There were a couple of single machines, the versions I don't remember save the SuperPet that help make me a "super" programmer. 🤓

My understanding is that Comodore pushed heavily into the school systems in Canada. 
 

Link to comment
Share on other sites

On 1/14/2022 at 12:10 AM, codewar65 said:

Waterloo on the SuperPET. *warm fuzzies* BASIC, Pascal, and FORTRAN. They never offered APL at my college. I did take up COBOL at university and ended up do stupid Y2K stuff at a job decades later...
 

I had to take APL in college. I remember virtually nothing of it, and it is truly write only code (at least in the environment we used).

  • Like 1
Link to comment
Share on other sites

On 1/14/2022 at 12:34 AM, codewar65 said:

If the 6809 sold at the same price or less than a 6502, would Sweet16 even exist?

If it was sold at the same price or less than a 6502 at the time of 6502 introduction, possibly not ... the 6809 is a fine instruction set.

Regarding the original topic, I've been looking at something I mentioned in another thread:

Quote

However, even with a different dispatch model, if trying to squeeze object size in a "Sweet 16 replacement", rather than optimizing for speed, I could imagine have a single indirect load and a single indirect store routine, which works out from the bits of the opcode and the status of the carry flag whether it is pre-decrement or post-increment and whether it is a single or double byte operation, covering 7 operations in two routines. Direct register moves could be handled by putting source in Y and destination in X, at the cost of using absolute rather than direct addressing for the Y-indexed operation, giving one routine the two direct ones. One could imagine the immediate register load being run by the two-byte accumulator load, setting the indirect source register to R15, the PC register, and using Y-indexed store, so the immediate load is taken over by the single indirect load routine as well.

Then at the cost of three more zero page bytes ... two more bytes in a dedicated "register 17" initialized to $0001, and one set to either $80 or $00 based on whether adding or subtracting, setting up the correct target and operand index in X and Y would all allow all five arithmetic operations to be done in a single routine. If that was done by shifting the instruction one bit to the left and using the carry flag and sign flag to split the code set into quarters, you might restrict the jump table to the $0n instructions, making it only 26-32 bytes long

After looking more closely, the same routine can handle byte and word indirect load, the same can handle byte and word indirect store, and with an entry stub byte pop and byte store-pop (before SIGN is examined to see whether to run load or store). So that's six operations with two routines.

The same routine can handle add and subtract, and with an entry stub compare. A single routine can handle direct load or store, a single routine can handle increment and decrement. So that's seven more operations with three routines.

Among the "embedded register" operations, only word pop (POPD) and SET are singletons, because the way that the first decrements twice in process, which cannot be handled by a prefix to indirect load or store (which post-increment), and setting a value with the contents of the accumulator doesn't make any sense.

Even though each routine is longer than the SweetCX16 routines, the reduction in number of routines to seven to cover 15 operations  makes the codesize smaller.

In the dispatch, after branching to handle the "Branch & etc." ($0n) ops by using the bit4 value to set SIGN to $00 or $FF, clearing bit4 and LSR four times to get one of eight index values from 0 to 14, storing that in X so that JMP (REGOPS,X) based on a 16byte (rather than 30 byte) vector table, saving 14 more bytes.

Handily, the index (even numbers from 0 to 14) are in both A and X on dispatch, so if the index is used (as in indirect loads and indirect store to tell whether it's a byte or a word load), you can do "TYX: TAY" to save the index where it can be tested directly with "CPY #n".

I haven't tackled the Branch operations, but I am thinking a similar process can be used with the low bit of the operand, since 8 of 13 are by pairs: Branch No Carry / Branch Carry; Branch Plus / Branch Minus; Branch Zero / Branch Nonzero; and Branch if Minus 1 / Branch if not Minus One. If Carry, Minus, Nonzero, and non-Minus 1 are each tested with a result of #$0 if the condition is met and #$FF if the condition is not met, then jumping to BRANCH with EOR SIGN will invert the status for the "odd" operands (Carry, Minus, Nonzero, Not-minus one), and leave the status alone for the "even" operands. Then a branch is performed if the result after EOR SIGN is $#FF. Then Branch Always simply calls BRANCH with a status of $00, since Branch Always is an "odd" op. So that handles 9 of 13 ops. RTN is easy, since it is op $00, "CMP #0 : BEQ RTN". BRK, RS and BS are all singletons, but the dispatch can use the "SIGN" value to distinguish between BK and RS and jump to BS on it's own, so filter out RTN, extract SIGN based on the low bit, clear the low bit, transfer to X and do an X-indexed Jump on a 14 bytes index table ... rather than 26 in SweetCX16 ... crunches the size even more.

The hope would be to get smaller than the original Sweet16, so that there is a "faster, large footprint" version and a "slower, smaller footprint" version.

 

Edited by BruceMcF
  • Like 1
Link to comment
Share on other sites

Yep, I was messing with those opcodes, grouping them one way and another, thinking "surely a little decode can reduce size".  I'm sure Woz didn't decode because 300 bytes was the golden compromise for him.

 

Link to comment
Share on other sites

On 1/14/2022 at 11:46 PM, rje said:

Yep, I was messing with those opcodes, grouping them one way and another, thinking "surely a little decode can reduce size".  I'm sure Woz didn't decode because 300 bytes was the golden compromise for him.

Since he organized the instruction set for ease of hand-assembly, with only CPR having the opcode it has for functional reasons, I do think that saving odd/even in a zero page byte, and cutting the size of the two vector tables in half is the most useful decode.

 

Edited by BruceMcF
  • Like 1
Link to comment
Share on other sites

On 1/16/2022 at 9:01 AM, BruceMcF said:

Since he organized the instruction set for ease of hand-assembly, with only CPR having the opcode it has for functional reasons, I do think that saving odd/even in a zero page byte, and cutting the size of the two vector tables in half is the most useful decode.

However, after drafting several approaches, the game is not worth the candle ... the smallest I can come up with, without going in and copy and pasting from Woz's code, gets down to 416 bytes from the 496 bytes of the smaller version of the "pure" JUMP (optable,X) version that jumps directly to each OP. With a drop down to 394 bytes available from just adopting Woz's code, (including save/restore register code that Woz's version gets from the Apple II ROM), it's not worth it.

Not, that is, unless someone could find space savings IN Woz's version by doing some decoding, but as spaghetti coded as the original Sweet16 is, that someone would not be me.

If either version of my Sweet16 and Woz's original are assembled to be at the END of GoldenRAM, they each would have a different start point.

However, after translating a copy of Woz's code to acme assembler, with "SAVE" and "RESTORE" in front, I find there are six bytes at the end free before the end of the page. Then I could assemble versions of all three with a two routine jump table at the TOP of  golden RAM ($07FA and $CFFA for CX16 and C64 respectively), one for entering Sweet16, the other for entering either SAVE or RESTORE (based on carry set or carry clear). Then the starting point of the routine is flexible, C64 code could enter Sweet16 with JSR $CFFE and CX16 code with JSR $07FE.

That would make it possible to assemble Sweet16 code independent of the choice of Sweet16 VM.

To fit into that, I'm going to shrink the size of my "two page" version by using INC Register and DEC Register subroutines, which will free up as much space as it frees up, and leave my "3 page" version as the full fat speed optimized version.

Edit: What I get is that the "full fat" Sweet16c would occupy $0500-$07FF of Golden Ram, leaving one page (256 bytes) free at $0400. The "two page" Sweet16c2 would occupy $061C-$07FF, leaving 530 bytes (two pages plus 18 bytes) of Golden RAM available at $0400. And the adapted "Sweet 16 original" with SAVE/RESTORE code included and the jump table would occupy $066f-$07FF, leaving 623 bytes (two pages plus 111bytes) of Golden RAM free.

TBC, none of those are tested code, so the final numbers may vary following bug fixes, but those should be the right ball park.

Edited by BruceMcF
  • Like 1
Link to comment
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

 Share

×
×
  • Create New...

Important Information

Please review our Terms of Use