Jump to content

SweetCX16


Recommended Posts

On 1/10/2022 at 5:50 PM, BruceMcF said:

There ought to be differences throughout ... it is not attempting to be a port of Woz's code, it is attempting to be an open source VM that executes Sweet16 source, and beyond that is explicitly focusing on using a faster approach routine dispatch.

And in part it is explicitly pursuing a different speed / codesize tradeoff than Woz's Sweet16 because I doubt I could pursue Woz's specific goals and do any better than he did.

Ed and I were comparing original source to original "source"... turns out SWEET16 itself either has variations or typos in the wild.

I get that yours is an open-source implementation of the API.  It's also more likely to work for me than the original.

  • Like 1
Link to comment
Share on other sites

On 1/10/2022 at 5:50 PM, BruceMcF said:

There ought to be differences throughout ... it is not attempting to be a port of Woz's code, it is attempting to be an open source VM that executes Sweet16 source, and beyond that is explicitly focusing on using a faster approach routine dispatch.

And in part it is explicitly pursuing a different speed / codesize tradeoff than Woz's Sweet16 because I doubt I could pursue Woz's specific goals and do any better than he did.

 

Given your suggestions about better use of SWEET16 -- i.e. as a setup rather than something for inner loops -- maybe there's not much benefit to using something like it to replace bits of the KERNAL.  Except for KERNAL routines used for system initialization, of course.

 

Link to comment
Share on other sites

On 1/12/2022 at 10:01 AM, rje said:

Given your suggestions about better use of SWEET16 -- i.e. as a setup rather than something for inner loops -- maybe there's not much benefit to using something like it to replace bits of the KERNAL.  Except for KERNAL routines used for system initialization, of course.

Yes, that's the basic idea ... routines used for system initialization. The original Sweet16 saved more space in the Apple ROM(s?) than Sweet16 used, so it was in a "free" resource in a space consumption sense, at a time when ROM cost much more per KB than it does today.

In the context of the X16, the most appealing aspect may be the ability to conserve on relatively scarce Low RAM if a Sweet16 VM is available.

Edited by BruceMcF
  • Like 1
Link to comment
Share on other sites

On 1/12/2022 at 3:23 PM, BruceMcF said:

Yes, that's the basic idea ... routines used for system initialization. The original Sweet16 saved more space in the Apple ROM(s?) than Sweet16 used, so it was in a "free" resource in a space consumption sense, at a time when ROM cost much more per KB than it does today.

In the context of the X16, the most appealing aspect may be the ability to conserve on relatively scarce Low RAM if a Sweet16 VM is available.

I'm still thinking that a 16K ROM bank could benefit from Sweet16... assuming that you only want to work with that 16K, and assuming you've got more stuff to put in that 16K than you would normally be able to squeeze into it.

  • Like 1
Link to comment
Share on other sites

Apple's garage computer was a piece of wizardry. Woz made an amazing machine with what he had. Jobs made the world eat it.

I was picked out to help my high school select a system for our first computer lab back in the early 80's (I was one of a few kids in school who already owned a Home Computer). Tandy vs Apple vs Commodore. I picked the Pet as it was a serious machine (i've seen all three in action and specs); the school (as did many) ate the Apple. Because of educational discounts to indoctrinate the youth to continue to eat the Apple. We still see the effects to this day.

Apple = Eye candy. Get a real system.

The 6809 was a wonderful processor that I have worked with. With the (indirectly with the SuperPet SP9000) and with coding the chip directly. I wish it took off. Really.

Link to comment
Share on other sites

On 1/13/2022 at 10:43 PM, codewar65 said:

The 6809 was a wonderful processor that I have worked with. With the (indirectly with the SuperPet SP9000) and with coding the chip directly. I wish it took off. Really.

My computer knowledge really took off in grade 12 when the typing teacher removed me from the regular class (I think my knowledge and ability really challenged her.) I was given a SuperPet, disk drive and the complete set of Waterloo languages with manuals  to work in another room (so I wouldn't be disruptive to the main class or teacher.)  Structured Basic, Pascal and Fortran were real joys, APL was just neat but my background math knowledge was lacking. Assembler on the 6809 and 6502 interested me; I should have done more. Cobol didn't make sense then (or now) - that experience suggested I avoid working it for Y2K. 

From reading about the evolution of the Waterloo languages, there must have been a couple of engineering/CS students who had exposure at the university level. Hopefully they benefited from the experience.  I did!

Edited by Edmond D
  • Like 1
Link to comment
Share on other sites

On 1/13/2022 at 10:43 PM, codewar65 said:

the school (as did many) ate the Apple.

In Ontario (a large, rich province in Canada), a couple of PETS were in the math classrooms in the early 80 when I was in grade 7. In New Brunswick (a smaller economically challenges province), I got to use PETS from 1984-1987.  The main lab had several pets chained to one disk drive - most likely the cheapest option. There were a couple of single machines, the versions I don't remember save the SuperPet that help make me a "super" programmer. 🤓

My understanding is that Comodore pushed heavily into the school systems in Canada. 
 

Link to comment
Share on other sites

On 1/14/2022 at 12:10 AM, codewar65 said:

Waterloo on the SuperPET. *warm fuzzies* BASIC, Pascal, and FORTRAN. They never offered APL at my college. I did take up COBOL at university and ended up do stupid Y2K stuff at a job decades later...
 

I had to take APL in college. I remember virtually nothing of it, and it is truly write only code (at least in the environment we used).

  • Like 1
Link to comment
Share on other sites

On 1/14/2022 at 12:34 AM, codewar65 said:

If the 6809 sold at the same price or less than a 6502, would Sweet16 even exist?

If it was sold at the same price or less than a 6502 at the time of 6502 introduction, possibly not ... the 6809 is a fine instruction set.

Regarding the original topic, I've been looking at something I mentioned in another thread:

Quote

However, even with a different dispatch model, if trying to squeeze object size in a "Sweet 16 replacement", rather than optimizing for speed, I could imagine have a single indirect load and a single indirect store routine, which works out from the bits of the opcode and the status of the carry flag whether it is pre-decrement or post-increment and whether it is a single or double byte operation, covering 7 operations in two routines. Direct register moves could be handled by putting source in Y and destination in X, at the cost of using absolute rather than direct addressing for the Y-indexed operation, giving one routine the two direct ones. One could imagine the immediate register load being run by the two-byte accumulator load, setting the indirect source register to R15, the PC register, and using Y-indexed store, so the immediate load is taken over by the single indirect load routine as well.

Then at the cost of three more zero page bytes ... two more bytes in a dedicated "register 17" initialized to $0001, and one set to either $80 or $00 based on whether adding or subtracting, setting up the correct target and operand index in X and Y would all allow all five arithmetic operations to be done in a single routine. If that was done by shifting the instruction one bit to the left and using the carry flag and sign flag to split the code set into quarters, you might restrict the jump table to the $0n instructions, making it only 26-32 bytes long

After looking more closely, the same routine can handle byte and word indirect load, the same can handle byte and word indirect store, and with an entry stub byte pop and byte store-pop (before SIGN is examined to see whether to run load or store). So that's six operations with two routines.

The same routine can handle add and subtract, and with an entry stub compare. A single routine can handle direct load or store, a single routine can handle increment and decrement. So that's seven more operations with three routines.

Among the "embedded register" operations, only word pop (POPD) and SET are singletons, because the way that the first decrements twice in process, which cannot be handled by a prefix to indirect load or store (which post-increment), and setting a value with the contents of the accumulator doesn't make any sense.

Even though each routine is longer than the SweetCX16 routines, the reduction in number of routines to seven to cover 15 operations  makes the codesize smaller.

In the dispatch, after branching to handle the "Branch & etc." ($0n) ops by using the bit4 value to set SIGN to $00 or $FF, clearing bit4 and LSR four times to get one of eight index values from 0 to 14, storing that in X so that JMP (REGOPS,X) based on a 16byte (rather than 30 byte) vector table, saving 14 more bytes.

Handily, the index (even numbers from 0 to 14) are in both A and X on dispatch, so if the index is used (as in indirect loads and indirect store to tell whether it's a byte or a word load), you can do "TYX: TAY" to save the index where it can be tested directly with "CPY #n".

I haven't tackled the Branch operations, but I am thinking a similar process can be used with the low bit of the operand, since 8 of 13 are by pairs: Branch No Carry / Branch Carry; Branch Plus / Branch Minus; Branch Zero / Branch Nonzero; and Branch if Minus 1 / Branch if not Minus One. If Carry, Minus, Nonzero, and non-Minus 1 are each tested with a result of #$0 if the condition is met and #$FF if the condition is not met, then jumping to BRANCH with EOR SIGN will invert the status for the "odd" operands (Carry, Minus, Nonzero, Not-minus one), and leave the status alone for the "even" operands. Then a branch is performed if the result after EOR SIGN is $#FF. Then Branch Always simply calls BRANCH with a status of $00, since Branch Always is an "odd" op. So that handles 9 of 13 ops. RTN is easy, since it is op $00, "CMP #0 : BEQ RTN". BRK, RS and BS are all singletons, but the dispatch can use the "SIGN" value to distinguish between BK and RS and jump to BS on it's own, so filter out RTN, extract SIGN based on the low bit, clear the low bit, transfer to X and do an X-indexed Jump on a 14 bytes index table ... rather than 26 in SweetCX16 ... crunches the size even more.

The hope would be to get smaller than the original Sweet16, so that there is a "faster, large footprint" version and a "slower, smaller footprint" version.

 

Edited by BruceMcF
  • Like 1
Link to comment
Share on other sites

Yep, I was messing with those opcodes, grouping them one way and another, thinking "surely a little decode can reduce size".  I'm sure Woz didn't decode because 300 bytes was the golden compromise for him.

 

Link to comment
Share on other sites

On 1/14/2022 at 11:46 PM, rje said:

Yep, I was messing with those opcodes, grouping them one way and another, thinking "surely a little decode can reduce size".  I'm sure Woz didn't decode because 300 bytes was the golden compromise for him.

Since he organized the instruction set for ease of hand-assembly, with only CPR having the opcode it has for functional reasons, I do think that saving odd/even in a zero page byte, and cutting the size of the two vector tables in half is the most useful decode.

 

Edited by BruceMcF
  • Like 1
Link to comment
Share on other sites

On 1/16/2022 at 9:01 AM, BruceMcF said:

Since he organized the instruction set for ease of hand-assembly, with only CPR having the opcode it has for functional reasons, I do think that saving odd/even in a zero page byte, and cutting the size of the two vector tables in half is the most useful decode.

However, after drafting several approaches, the game is not worth the candle ... the smallest I can come up with, without going in and copy and pasting from Woz's code, gets down to 416 bytes from the 496 bytes of the smaller version of the "pure" JUMP (optable,X) version that jumps directly to each OP. With a drop down to 394 bytes available from just adopting Woz's code, (including save/restore register code that Woz's version gets from the Apple II ROM), it's not worth it.

Not, that is, unless someone could find space savings IN Woz's version by doing some decoding, but as spaghetti coded as the original Sweet16 is, that someone would not be me.

If either version of my Sweet16 and Woz's original are assembled to be at the END of GoldenRAM, they each would have a different start point.

However, after translating a copy of Woz's code to acme assembler, with "SAVE" and "RESTORE" in front, I find there are six bytes at the end free before the end of the page. Then I could assemble versions of all three with a two routine jump table at the TOP of  golden RAM ($07FA and $CFFA for CX16 and C64 respectively), one for entering Sweet16, the other for entering either SAVE or RESTORE (based on carry set or carry clear). Then the starting point of the routine is flexible, C64 code could enter Sweet16 with JSR $CFFE and CX16 code with JSR $07FE.

That would make it possible to assemble Sweet16 code independent of the choice of Sweet16 VM.

To fit into that, I'm going to shrink the size of my "two page" version by using INC Register and DEC Register subroutines, which will free up as much space as it frees up, and leave my "3 page" version as the full fat speed optimized version.

Edit: What I get is that the "full fat" Sweet16c would occupy $0500-$07FF of Golden Ram, leaving one page (256 bytes) free at $0400. The "two page" Sweet16c2 would occupy $061C-$07FF, leaving 530 bytes (two pages plus 18 bytes) of Golden RAM available at $0400. And the adapted "Sweet 16 original" with SAVE/RESTORE code included and the jump table would occupy $066f-$07FF, leaving 623 bytes (two pages plus 111bytes) of Golden RAM free.

TBC, none of those are tested code, so the final numbers may vary following bug fixes, but those should be the right ball park.

Edited by BruceMcF
  • Like 1
Link to comment
Share on other sites

  • 3 months later...
Posted (edited)

I've had a rethink on the three "unused ops" in Woz's Sweet16, and what I've decided is to use that as embedded calls for Machine Language routines. My first idea for calls was trying to make it possible to call Kernel calls directly, but I've since realized that the ML code that is called to can be a bridge routine, so it is not necessary to build register loading and retrieval into the Sweet16 operation ... the routine that is called can handle that as appropriate.

The first thing to do is to wedge it into Woz's original code base. What I've already done is insert "SAVE" and "RESTORE" in between his OPTBLE/BRTABLE data and the "SET" operation which must occur on the 2nd address (or later) of the page holding the Sweet16 ops themselves, since he dispatches with the page address pushed onto the stack, then the table entry, which is (opaddress-1), pushed onto the stack, and then RTS to dispatch the operation.

I also have the codebase END with "JMP Sweet16", so that Sweet16 VM's of different sizes can be placed at the END of GoldenRAM and be called with a stable entry point.

This leaves 3 bytes leeway, in which I put JMP SYSOP, which calls the common routine that executes one of the SYS operations.

I have three "SYS" calls. All SYS calls jump through an indirect call via register 13, the register used by CPR to store the results of a comparison operation. "SYSR n" uses the contents of the register pointed to by the status byte, which is most often Register 0, the Sweet16 accumulator. The current status of CARRY is in the carry flag when executing the call.  "SYS13" uses the current contents of register 13 (and it is the user responsibility to make sure there hasn't been a CPR operation since it was loaded), and the carry flag is clear. For both SYSR and SYS13, the value of "n" is simply available for any use the called routine may wish to make of it.

"SYSZ n" loads R13 with the 16 bit value it finds at the zero page address "n". This is DESIGNED to allow the Sweet16 register with the target address to be specified with "SYSZ Reg0" through "SYSZ Reg14" (using the PC at R15 would not work, as it will contain "n" rather than a ML routine) ... but it CAN be used to execute ANY address in the zero page.

At one and the same time, these SYS operations allow the writing of bridge routines to called Kernel routines, as well as routines to extend Sweet16 to include any desired operation.  Indexed calls are available by simply using Sweet16 ADD operations and "SYSR" on the result.

Note that "SYSZ" uses zero page address, not register number like Sweet16 instruction codes, so I will also note that a convenient way to include Sweet16 code in your assembly code is to define the opcodes and registers as named byte symbols and use your byte data pseudo-op to include the code. Bytewise OR ("|" in ACME) can be used for the 15 registers that embed their target register in their bytecode, with the register number given rather than the register address. An advantage of this is that "extended" Sweet16 code with SYSZ that is portable between Apple systems, based on 16 pseudo-registers at $00-$1F, and those for the C64/CX16, based on 16 pseudo-registers at $02-$21, can be ported by simply re-assembling with the register symbols set correctly.

Placing $0416 in Register 10 would be done with
  !byte ..., SET|10, $16, $04,...

Then using that register to call the routine at the Golden RAM location $0416 would be done with:
  !byte ..., SYSZ, Reg10, ...

 

Edited by BruceMcF
  • Like 1
  • Thanks 1
Link to comment
Share on other sites

Posted (edited)
On 4/24/2022 at 2:51 PM, BruceMcF said:

I've had a rethink on the three "unused ops" in Woz's Sweet16, and what I've decided is to use that as embedded calls for Machine Language routines. My first idea for calls was trying to make it possible to call Kernel calls directly, but I've since realized that the ML code that is called to can be a bridge routine, so it is not necessary to build register loading and retrieval into the Sweet16 operation ... the routine that is called can handle that as appropriate.

The first thing to do is to wedge it into Woz's original code base. What I've already done is insert "SAVE" and "RESTORE" in between his OPTBLE/BRTABLE data and the "SET" operation which must occur on the 2nd address (or later) of the page holding the Sweet16 ops themselves, since he dispatches with the page address pushed onto the stack, then the table entry, which is (opaddress-1), pushed onto the stack, and then RTS to dispatch the operation.

I also have the codebase END with "JMP Sweet16", so that Sweet16 VM's of different sizes can be placed at the END of GoldenRAM and be called with a stable entry point.

This leaves 3 bytes leeway, in which I put JMP SYSOP, which calls the common routine that executes one of the SYS operations.

I have three "SYS" calls. All SYS calls jump through an indirect call via register 13, the register used by CPR to store the results of a comparison operation. "SYSR n" uses the contents of the register pointed to by the status byte, which is most often Register 0, the Sweet16 accumulator. The current status of CARRY is in the carry flag when executing the call.  "SYS13" uses the current contents of register 13 (and it is the user responsibility to make sure there hasn't been a CPR operation since it was loaded), and the carry flag is clear. For both SYSR and SYS13, the value of "n" is simply available for any use the called routine may wish to make of it.

"SYSZ n" loads R13 with the 16 bit value it finds at the zero page address "n". This is DESIGNED to allow the Sweet16 register with the target address to be specified with "SYSZ Reg0" through "SYSZ Reg14" (using the PC at R15 would not work, as it will contain "n" rather than a ML routine) ... but it CAN be used to execute ANY address in the zero page. ...

The wedge into Woz's Sweet16 is something like (if the registers are not at $00-$1F ... in the original AppleII registers, "SEC : SBC #R0L" can be omitted):

Quote

SYSOP:
   CPX #$1C    ; X = #$1C = 2*SYSR?
   BEQ SYS1    ; If so, test register index is in A
   BMI SYS2    ; X= #$1A = 2*SYS13, no loading needed
   LDY #0        ; Else X = #$1E = 2*SYSZ
   LDA (R15L),Y    ; ZP address is at (Reg15)
   SEC        ; Adjust to use R0L,X indexing
   SBC #R0L
   CLC
SYS1:
   JSR SYS3    ; Fetch vector into Reg13, then use
   RTS
SYS2:
   CLC        ; Vector already in Reg13, just use
   JSR SYS4
   RTS

SYS3:
   TAX             ; Load Reg13 if needed, ...
   LDA R0L,X
   STA R13L
   LDA R0H,X
   STA R13H
SYS4:
   JMP (R13L)    ; Vectored jump based on (Reg13)

 

Swift16 will be similar, but will be able to jump directly to the SYSR, SYS13 and SYSZ operations, since Swift16 operations do not have to start executing in the same page.

Edited by BruceMcF
Link to comment
Share on other sites

On 4/25/2022 at 10:30 AM, rje said:

That's a clever use of the assembler to write target-agnostic Sw*t16.

One thing to be careful of is that the code using the Sweet16 VM cannot be in the same namespace as the code implementing the Sweet16 VM, because the namespace uses the "plaintext" names of the operations as addresses of the implementation of the operation, while the code using the Sweet16 VM would have those defined as symbols for the opcode of those operations.

Link to comment
Share on other sites

Posted (edited)
On 4/25/2022 at 6:02 PM, BruceMcF said:

The wedge into Woz's Sweet16 is something like (if the registers are not at $00-$1F ... in the original AppleII registers, "SEC : SBC #R0L" can be omitted):
"..."

Swift16 will be similar, but will be able to jump directly to the SYSR, SYS13 and SYSZ operations, since Swift16 operations do not have to start executing in the same page.

Waitaminute! I just realized that the Sweet16 "status" register is the HIGH byte of Register 14 ... if Register 13 is being "reused" as the temporary store for the SYS jump vector ... so can the low byte of Register 14, allowing a complete JMP() instruction to be built IN the Sweet16 register space. Instead of SYS13, I can have a SYSZ call with a one byte zero page address, and a SYSM call with a two byte absolute address, which can increment Reg15 and use it to grab the high byte of the address.
 

Quote

 

SYSOP:
  CLC
  LDY #0  ; not used in 65C02
  CPX #$1C    ; 2*$0E = 2*SYSZ = $1C
  BMI SYS2    ; 2*$0D = 2*SYSR = $1A -- contents of A is a zero page address
  BEQ SYS1   ; If NE, then, 2*$0F = 2*SYSM = SEC to fetch high byte
  SEC
SYS1:
  LDA (R15L),Y ; fetch zero page address, "LDA (R15L)" in 65C02
SYS2:
  STA R13H ; low byte of JMP() operand
  LDA #$6C ; JMP() opcode
  STA R13L
  TYA ; for zero page addressing
  BCC SYS4
  INC R15L
  BNE SYS3
  INC R15H
SYS3:
  LDA (R15L),Y
  CLC
SYS4:
  STA R14L
  JMP R13L

I'm thinking the Swift16 version would be basically the same, but with three entry points because of no need to "wedge" the call into the common Sweet16 VM opcode page:

Quote

SYSR:
  CLC
  BRA SYS2
SYSZ:
  CLC
  BRA SYS1
SYSM:
  SEC
SYS1:
  LDA (R15L) ; fetch first byte of operand
SYS2:
  STZ R14L ; High byte of operand for zero page addressing
  STA R13H ; store first byte of operand
  LDA #$6C ; JMP() opcode
  STA R13L ; JMP() instruction is now built
  BCC SYS4
  INC R15L
  BNE SYS3
  INC R15H
SYS3:
  CLC
  LDA (R15L)
  STA R14L
SYS4:
  JMP R13L ; returns to Sweet16 VM executive loop

 

Edited by BruceMcF
Link to comment
Share on other sites

Posted (edited)

I've been thinking on this, and think that while I was getting closer, I was fighting Sweet16 too much, rather than going along with it.

Given that the calls are going to be machine language routines providing operations IN the Sweet16 source code -- whether all new operations or bridge calls to Kernel calls -- they can be packaged into a jump table or vector table format for access, so what is really needed is an INDEXED machine language call. That fits well with the single byte operand of the Branch operations.

Also, while the contents of Reg13 are purely transitory, since they are overwritten by each CPR operation ... if a "JMP addr" or "JMP (addr)" instruction is written in R13L, R13H and R14L, then these operations can take advantage of the fact that R14L is a "free" single byte register (the "high" byte, R14H, is constantly over-written to point to the register that Zero/Nonzero, Minus1,NotMinus1 refer to after load, arithmetic and comparison operations), so unlike the JMP opcode and the low byte of the operand, the high byte of the operand in R14L can stay resident.

Which leaves me at TWO TYPES of operation, making up the "Tabled System Calls" opecodes: "TBL page", which sets R14L to the desired page (high byte address), and the "SYS n" and "SYSI n" operations, which performs either a jump TO the nth byte of the table page or a jump using the VECTOR at the nth byte of the table page.

In the "Sweet16 wedge", included with the block of SAVE and RESTORE code between the opcode tables and the "opcode page", this would be something like:
 

Quote

; $0D -- TBL n -- set binary page (high address byte) used for SYS calls
; $0E -- SYS n -- Jump to indexed address of table page
; $0F -- SYSI n -- Jump using indexed vector of table page

SYSOP:    LDA #$4C
    CPX #$1C
    BMI SYS2
    BEQ SYS1
    LDA #$6C
SYS1:    STA R13L
    LDY #0
    LDA (R15L),Y
    STA R13H
    JMP R13L

SYS2:    LDY #0
    LDA (R15L),Y
    STA R14L
    RTS

where the "Swift16" version would be something like:

Quote

; $0D -- TBL n -- set high page of SYS calls
; $0E -- SYS n -- Jump to indexed address of table page
; $0F -- SYSI n -- Jump using indexed vector of table page

SYS:    LDA #$4C
    BRA +
SYSI:    LDA #$6C
+  STA R13L
    LDA (R15L)
    STA R13H
    JMP R13L

TBL:    LDY #0
    LDA (R15L),Y
    STA R14L
    RTS

 

 

Edited by BruceMcF
  • Like 1
Link to comment
Share on other sites

Posted (edited)

After the effort of trying to "crunch" the JMP (abs,X) approach to a Sweet16 VM couldn't beat Woz's code for compactness, I've evolved toward a slightly extended version of Woz's Sweet16 as the "compact" VM, Sweet16c for the 65C02 as the "faster, though larger" 65c02" version, and a 65816 version of the VM that can execute mixed 6502/Sweet16 code with the 6502 code running in emulation mode and the Sweet16VM implemented in native 65816 mode.

Now, aside from porting the VM independent of Woz's code, I have two "new" things: the three new System Jump Table opcodes, and the jump table at the end allowing the same code on a system to be able to be used with a variety of Sweet16 VM implementations.

However, assembling the Woz code with my SYSOP "wedge", the page with the opcodes didn't have space for the jump table --  it came up three bytes short.
The first opcode has to be at address $01 or higher in the page, because first "LDA #>SET" is pushed onto the stack, and then the bottom byte of the subroutine return vector is defined with, eg,, "<SET-1". But if SET is at (eg) $0700, then ">SET" is $07 and "<SET-1" is $FF, because SET-1 is $06FF. But then the subroutine return vector on the stack is, effectively, $07FF, which returns to $0800 ... oops!

To be clear, the idea is to tuck the VM up "high" in a memory space ... the top of "Golden RAM", or the top of a HighRAM segment, or etc. The "high entry point" when added to Woz's original VM really has to fit into the end of the same page that has the opcodes.

But if placing the first opcode routine at one past the page boundary, my precious two-operation jump table spills three bytes out of Golden RAM!

The first trick is following Woz's lead with "BPL SETZ" being an effective "BRA SETZ" because branch apps are called after loading A with the offset from Register0 of the register that the status is based on, so the sign flag should always be clear when starting execution of a "Branch Op".

I had already done that with "BPL SYSOP" ... but Woz placed "RTN: JMP RTNZ" at the end of his code. Replacing that with a "RTN: BPL RTNZ" in front of "SET: BPL SETZ" saves one byte.

And then the second trick was a design simplification, winnowing the jump table to just the single "JMP SWEET16". The idea of the second routine in the table was to export the Save/Restore register routines, but it is possible to set things up so that that their addresses can be inferred, so I've settled for that.

Now it all JUST fits. And  ... with a single byte to spare!

NOTE: The idea I am have been toying that makes direct access to register restore an issue for interspersed Sweet16 and 6502 code is to make the state of carry significant when entering Sweet16: with carry clear, state is stored on entry and restored on exit, with carry set. So if originally called with carry clear, then returning to 6502 code for some task before returning to Sweet16 code with carry set, the ORIGINAL state stored when first entering Sweet16 is still there, and at the end of the WHOLE process, Sweet16 can return to 65C02 code which can end with a JUMP to restore the state, where the restore state subroutine returns to the caller. And of course, say, fetching the call address at the end of the Sweet16 VM, subtracting two from it and fetching the word at that address in a Sweet16 register (that the process won't be using) is a very short routine in Sweet16 code. If it was Reg11, the terminating 6502 ending code might end with JMP (Reg11) to restore the register state when the whole combined routine was first called.

Edited by BruceMcF
  • Like 1
Link to comment
Share on other sites

Posted (edited)

OK, cracked it. Since I have exactly one byte leeway in my "augmented version", what I am doing is this:

START POINT ; Doesn't have to be first byte of VM, but often is
   JSR PUTSTATE
   ...

GETSTATE:
   LDA REGP
   PHA
   LDA REGA
   LDX REGX
   LDY REGY
   PLP
   RTS
 

PUTSTATE:
   PHP
   STA REGA
   STX REGX
   STY REGY
   PLA
   STA REGP
   RTS

...

GS_OFFSET: !byte (PUTSTATE - GETSTATE)
; ENTRY POINT
   JMP SWEET16

... In other words, the final word of the VM is implicitly a handle for SAVE ... it contains a pointer to one less than the pointer to the SAVE routine.. So if I know how far RESTORE, aka GETSTATE is located (within 255 bytes), I can build my own jump table or vector table. That offset is contained in the byte before the  entry point.

The limitations on ANY Sweet16 VM using this system would be that the SAVE routine must FOLLOW the RESTORE routine, and be within 255 bytes of it.

It is arbitrary which one must be first, so this follows the Apple2 ROM addresses of register "SAVE" at $FF4A and register "RESTORE" at $FF3F, so a RAM based "augmented Sweet16" for an Apple II could re-use the Apple II ROM SAVE and RESTORE.

For the direct additions to Woz's original Sweet16 source code, I don't have an open source licensed copy (even if clearly Woz won't mind!), I can distribute additions to the source available at 6502.org, so that must follow the naming in the original, but for my own implementation, I avoid calling them "SAVE" and "RESTORE" to avoid confusion with C64 KERNAL / CX16 Kernal routines.

 

Edited by BruceMcF
  • Like 1
Link to comment
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

 Share

×
×
  • Create New...

Important Information

Please review our Terms of Use