Jump to content

Notes on Pascal-M models for the CX16


Recommended Posts

Posted (edited)

A little while back, while researching 6502 bytecode interpreters when I was working on the "SweetCX16" implementation of Steve Wozniak's venerable "Sweet16" virtual machine for the Apple II, I came across Pascal-M, an implementation of Pascal for the early KIM-1 8bit 6502 system based on a bytecode virtual machine (VM). The last update I knew of was the version 2 being ported to run on FreePascal.

Recently the original V1 Pascal-M was restored (recently as in 2020! ... so there may have been a bit of a "lockdown project" to it like medievalist armorer Tod Cutler's "Lockdown Longbow" series).

Now, Pascal-M is a lot more primitive than UCSD's bytecode Pascal. It's all upper case, no file I/O in version 1 (version 2 adds it), and of course it runs on a SLOW virtual machine. So why is this interesting as a target?

One thing that is interesting about this is the porting process. Basically, write the bytecode interpreter, including a loader to load a compiled bytecode program, then you can load the already-compiled compiler, then use it to compile the compiler AGAIN from the source, and if the original compiler you started with and the compiler that you created are exactly the same, you can be pretty sure that your bytecode interpreter has worked.

And while the interpreter should be written in assembly, a lot of it is straightforward. You write a bunch of primitives that do relatively straightforward things within the VM system model, each of them ending with a call to NEXTOP to execute the next bytecode. And while the Pascal-M documentation is rudimentary, it is originally based on the P2 VM p-code version of Pascal (which actually was NOT a bytecode) ... which is well documented, since in some computer science programs in some countries, students still "rebuild a Pascal compiler" as part of their first compilers class.

And the loader could literally be written in Basic.

While I have by no means started on the loader yet, I've got sketches for 35 bytecodes ... including the 8 that have an embedded parameter of 0-15 in the code, so occupy the first 128 bytes of the instruction set ... and have 21 to go. Now, that doesn't mean I'm 60% of the way done, because some of the ones I have not yet started on pack a wallop, like the opcode for the CASE statement which has the entire CASE jump table as it's operand. However, I may be over 25% of the way through the first draft of the interpreter in a combined few days of playing around with it.

Now, no doubt Pascal-M would be very slow ... m-code is not optimized for the 6502 to any degree other than being a byte code, where Nicholas Wirth's original pcode was based on 30 bit wide codes. But once the VM interpreter is written, there will be another working "high-ish for the 70s and 80s" language for the CX16.

Another appealing aspect of Pascal-M as a target is that almost the whole of Low RAM is available to it, and since it is a bytecode, the code density will be pretty good. The bytecode interpreter can actually be run out of a High RAM segment, so if any springboarding to other parts of High RAM can be done out of Golden RAM, the system will have 36K+ free for program code, stack, and heap.

Of course, being a VM, the virtual hardware can be redesigned in hardware. There is actually a bit of flexibility here, because there are three logical things ... the procedure CODE, the procedure stack data, both of which grow up, and the heap, which grows down, to work with.

That is, the m-code VM treats CODE and data, or "STORE", as different things, so if one is careful, it is possible to have in excess of 64K total addressable content, because CODE addresses and STORE addresses don't necessarily have to refer to the same memory location.

One model to look at is a "Big CODE" model, where the interpreter is placed into the high side of Low RAM, nestled somewhere up right before the I/O page at $9F00 ... and the CODE is run out of the High RAM window. Then the Heap and the data part of the Stack has 30K+ to work with while the code can extend for up to 64K. The amount of functionality that could be included in a program is increased by the fact that the Pascal system is designed to be able to load and unload distinct segments so, for example, a main program could load a set-up/initialized segment, and then free it up to load the procedures that are going to use the initialized data.

Another to look at is a "Big HEAP" model, where the bytecode operands that store things to general memory treat "STORE" addresses from $8000 up as coming from one of up to four High RAM segments, so the Heap extends from a virtual $FFFF to $8000, while the stack and program extends from $0801 until the start of the interpreter or $7FFF, whichever comes first.

Indeed, although I need to get further into the project than I am now to see if its possible, it might be possible to combine both of those. The VM codes that pull data from code generally push it onto the stack, and the access to data in the Heap is generally to and from the stack or the direct MOV instruction. So it may be that the trickiest part of combining a "HighRAM CODE" and "HighRAM Heap" model was already done when the High RAM Heap is developed, including the various cases that MOV must handle when moving data from one place to another, especially moving data between different parts of the Heap when they are not in the same HighRAM segment.

As some of us will recall quite well from Basic in the 80s, sometimes all that is needed to get something to run "fast enough" is to identify the worst bottlenecks and move them to assembly. So one of the appeals of starting with m-code is that it DOES have a "call assembled procedure" code. That's one of the 35 I have already sketched:

Quote

 

; BB X X, CAP Call assembly procedure
; The CAP instruction is used to call an assembly-language routine previously
; loaded into a specified address. The actual action of this instruction may
; be considered to be system-dependent.

; NOTE: The assembly procedure must end with RTS

CAP:    LDY #1
    LDA (PC),Y
    STA T
    INY
    LDA (PC),Y
    STA TH
    JSR TCALL
    LDA #2
    JMP NEXTCODE
TCALL:    JMP (T)

 

Now, to be clear, I am not reckoning that Pascal-M on the CX16 will "take over and reform the world!!!" ... I am not even expecting that it would ever be heavily used by anybody. However, it's still an appealing project to take something that ran on the KIM-1, MOS Technologies first 6502 computer ... pre-PET, pre Apple II ... and port it to the CX16.

spacer.png

Oh, by the way, that single KIM-1 card cannot possibly run Pascal-M ... it only has 1K of RAM! ... you would need something like a 32K RAM expansion card:

spacer.png

Edited by BruceMcF
  • Like 3
Link to comment
Share on other sites

Posted (edited)

"Hits back of own neck" ~ I was judging the size of the compiler by the size of the object file ... but the "19K compiler" is not a binary, it's a text based object file format.

That is, the object file is in a text format of lines mostly starting with "P" and then a digit to pick which kind of object line it is and then bytes which are in hexadecimal format to convert to the binary in the CODE space. So the programs are under half the size that I was thinking. A compiler program with a 19K filesize is less than 10K. Indeed, it would almost certainly be less than 8K, since some of the object lines are patches to generate the code address of forward references, and some is static data which could be stored in top of the data storage area, with the heap growing down from the bottom of the static data.

So I might go with a "SMALL" model of interpreter sitting where it loads at $0801, stack growing up from the end of the interpreter, heap growing downward from $9DFF, a 256 byte "mark stack" at $9E00-$9EFF, and CODE at $A000-$BFFF. Then a "MEDIUM" model could have three High RAM banks for programs and CODE addresses in the range of $A000-$FFFF. If the Compiler is somewhere in the rank of a single High RAM Segment, then 24K would be probably bigger than any Pascal-M program I would be likely to write.

That also saves times in bringing the system up for the first time, since the first version of the object loader program could be a Basic program that gives a menu of m-code object files and when you pick one, it loads it and builds the program in the High RAM segment, sets up the zero page, and then loads the interpreter, which starts right into a warm start and starts executing the program at "Virtual" $A000. A loader in Forth or Assembly would be faster, but that would work for getting started.

Edited by BruceMcF
  • Like 1
Link to comment
Share on other sites

Posted (edited)
4 hours ago, snake said:

Have you seen this P65PAS?

https://github.com/t-edson/P65Pas

No, but since Pascal-M would be hosted on the CX16, they are entirely different beasts. I am not playing around with writing an m-code interpreter because I have a desire to program the CX16 in Pascal, but as part of developing language tools hosted on the CX16.

Edited by BruceMcF
  • Like 2
Link to comment
Share on other sites

I've been struggling with an interpreter from the software-engineer side of things.  Tokenizers are easy, VMs are easy, compilers... are tricky for me.

 

Link to comment
Share on other sites

  • 2 weeks later...
Posted (edited)

Also, I never did much programming in Pascal, and I misread the Kim1 documentation on the m-code and the Pascal model. With the much clearer documentation of the FreePascal code for a generic m-code interpreter, I see that I was implementing the stack and heap the wrong way around. That is not a problem to fix up, though: it is actually easier doing it correctly, since with the (S) in zero page pointing to the bottom of the stack, (S),Y can easily reference all of the bytes in the built in operands -- at most 16bytes for operations on a pair of sets, which are 8byte membership bitmaps for up to 64 items.

With the relocatable bytecode allowing for an ability to load a program segment and then recover its space and load another one in other versions of P2 Pascal, I am now leaning toward keeping the HighRAM page available for an external code segment.

The procedure and functions that a program refers to are converted into an index number by the compiler, and are called by index number. The simplest way to implement that is to set aside two pages to hold vectors, and shift the index byte, using the carry bit to select between the two pages and the shifted lower 7bits of the index as the index to the address of the procedure/function.

So the idea is that if the address of the routine is below $8000, it's a routine in the base compiled code. If the address of the routine is above $8000, it's a routine at a location in a HighRAM page. Requiring it to be loaded at an aligned 8byte boundary means that 15bits can refer to a procedure stored anywhere in the first 1MB of potential HighRAM. Only the procedure and function call and return operations have to worry about whether it is a base or extended routine, and keeping track of which it is at present only requires one byte in zero page that has the #0 for a core procedure/function and the location HighRAM segment.

AFAIU, it's not Pascal-M that actually implemented the loadable code segments, but it was done in another P2 based Pascal, so there should be a scaffolding already in place for pursuing that approach: most importantly, already existing example Pascal code P2 code .

As far as using HighRAM for data, I am thinking that a RAMdisk is a pretty good approach. In this case, there ARE more developed versions of Pascal-M that have more file functions than Version 1, which has console input and output, a single stored input and a single storable output ... essentially a paper punch or cassette level of mass storage. That makes for an easy to implement RAMdisk at the start, where the program input file is loaded into HighRAM before loading the executable, and the file to save the resulting output file can be selected up front and the saving done when returning from executing the compiled Pascal program. So the program loader is the only part that needs to know how to work with actual CX16 file I/O and incorporating file support into the compiler can be postponed until later.

 

Edited by BruceMcF
Link to comment
Share on other sites

Bruce, are you storing your work on Github?  Because there is a non-zero chance that folks might see if there are bits that they could help with.  Or just appreciate your work.

 

Link to comment
Share on other sites

And, while the two projects are not related, I am wondering if your Sweeter16 code can assist your work here. 

No I don't have anything in particular in mind; I'm just thinking that these sorts of tools work best when you program bottom-up, starting with utilities that do simple things, then more complex things that use those simple things...     

...which you've already mentioned, in a way, in your initial post.

 

Link to comment
Share on other sites

57 minutes ago, rje said:

Bruce, are you storing your work on Github?  Because there is a non-zero chance that folks might see if there are bits that they could help with.  Or just appreciate your work.

 

When I have something that can be assembled, I definitely will put it up on Github ... right now and for the next couple of weeks, I am still at the puttering around phase.  I was able to borrow from xForth for the integer multiply and divide, but using a zero page indirect "LDA (S) : LDY #1 : ORA (S),Y : STA (S),Y : JSR DROP2" stack makes it very much it's own thing versus either SweetX16 or x4th.

Link to comment
Share on other sites

Posted (edited)

I figured out something that I found a little cool over the past two days.

A core part of the mcode interpreter (see, for example, this pdf scan of the documentation of the mcode interpreter) is the way that relative addressing is handled. 8 of the instructions have an embedded 4bit parameter that refers to the call level of the data, where the outer block if 0, a procedure it calls is 1, and so on, to a maximum depth of 15 below the outer block. The parameter is relative to the current call level, so a variable defined on the stack within the current level would be a parameter of "0", while if the current level is six below the outer block, a variable defined within the outer block would be a parameter of "6". Then there is one or two bytes that give the offset from the base of the stack for that level.

The entire start of the stack frame is (the stack grows down):

  • (... locals / working values of calling procedure)
  • [Space for return value if this is a function rather than a procedure]
  • Base Address for this procedure
  • Mark pointer of calling procedure
  • Program counter value of calling procedure
  • (... locals and working values of current procedure...)

The way that the Kim-1 code (zip file) did this was the way envisioned in the original language system design. There is a mark pointer that points to the marker position for the current procedure, and the start of the current procedure holds the mark pointer for the procedure that called it, and so on. So you get the mark pointer for the current level, check if the call level parameter is 0, if not, you fetch the value it points to and decrement the level, and repeat until the level count hits zero.

Needless to say, this is REALLY SLOW on the 6502 if you are 6 call levels deep and referencing a value created at the base level. But because of the way that Pascal was defined, lots of variables get created at the base level.

So what I have sketched instead is an X-indexed stack of call level based addresses. It is a 16 integer stack, so 32 bytes (probably on Low RAM rather than the zero page, to preserve some space for Assembly Language procedures), with the level of the index stored in a zero page byte, LEVEL. The parameter been shifted one left by the time this code is reached, so the parameter is now even numbers from 0 to 30. Add LEVEL to the parameter, put the index into X, and then the base address is available to have the offset parameter subtracted from it.

Quote

 

LEVELS:
    BPL LDCIS    ; LDCIS = $0x --> [$0x - $90] = $7x
    LDY #1        ; Save these 8 doing this individually
    ASL        ; LDAS-STR2 are $0x-$Fx, Carry is Set
    STA Z
    BMI +        ; LOD1 / LOD2 / STO1 / STO2 are $8x - $Fx
    CMP #$20
    BPL MSTN    ; LDAA is $2x/$3x, MSTO / MSTN are $4x - $7x
+    AND #$1E    ; Level index
    CLC
    ADC LEVEL
    TAX
    LDA LEVELL,X
    SEC
    SBC (PC),Y
    STA T
    LDA LEVELH,X
    SBC #0
    STA TH
; Parse instruction - lower 5bits have done their job
    LDA #$20
    BIT Z        ; former bits 5, 6, 7 in Z, V and S flags, respectively
    BPL LDAS    ; LDAS if Bit6 clear
    BVC LOD2    ; LOD1 / LOD2 if Bit6 set and Bit5 clear
    BEQ STR1
; 8X YY. STR2 - Store 2-byte data item into memory
STR2:    LDA (S),Y
    STA (T),Y

; 7X YY, STR1 ?Store 1-byte data into memory
; The STR1 instruction pulls one 16-bit item off the stack and stores the
; least significant byte into the YYth byte at the Xth level.
STR1:    LDA (S)
    STA (T)
    JSR POP2
    LDA #2
    JMP NEXTOPA

 

Now, while it does have the characteristic verbose character of 65c02 assembly working with 16bit values, it is MUCH faster in getting an offset computed for anything outside of the current procedure.

Now, in the m-code interpreter for the Kim-1, the base address is a dummy, because all offsets are computed by the interpreter relative to the position of the mark pointer. And with this "call LEVEL" stack, the marker pointer on the stack is also redundant. So that means that the program counter to return to is the only value needed for the stack frame.

If those two redundant values are removed, this means that the base address is the address of the stack pointer at the time the stack frame is being created. Just store that into the call LEVEL stack.

And THAT means that there is no need for dummy base address or mark pointer positions on the main system stack. It can just be:

  • (... locals / working values of calling procedure)
  • [Space for return value if this is a function rather than a procedure]
  • Program counter value of calling procedure
  • (... locals and working values of current procedure...)

... saving four bytes per procedure call.

Edited by BruceMcF
  • Like 2
Link to comment
Share on other sites

On 6/14/2021 at 9:59 AM, rje said:

Very nice.  I fully appreciate leveraging the CPU's addressing mode to boost performance of the higher-level language...

... but man would it be a lot shorter and faster if I could leverage the 65816's addressing modes.

Link to comment
Share on other sites

1 hour ago, paulscottrobson said:

With something like 6502 Pascal there's a reality that while the runtime will be fine to do a great deal, significant chunks of anything speed orientated (e.g. action games) will require assembler linked in.

Yes, the mcode has a Call Assembly Procedure bytecode.

  • Like 1
Link to comment
Share on other sites

Getting there ... maybe, since these are just sketches. But I have sketches for every Version 1 bytecode except the one that does the For loop and the one that does the Case statement, and then mostly the I/O standard procedures to go.

I haven't started on the loader, but it is going to be the easiest part. Plus, half of my classes for the semester ended in week 14 and after that I started getting caught up with grading for the other half that end in week 19, which is another two weeks, so I may be able to do the loader in a couple of longer programming stretches with actual assembly and debugging and the whole shebang over a coming weekend.

  • Like 1
Link to comment
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

 Share

×
×
  • Create New...

Important Information

Please review our Terms of Use