Jump to content

Paging of stack


Recommended Posts

Posted (edited)
On 5/3/2021 at 11:51 PM, Roman K said:

@ZeroByte I mean overriding the memory, not another device. Like my device with RAM on it listens to the address bus and responds to the CPU if some memory address is requested. If there is memory chip already responsible for that address, how is that conflict resolved? That should happen on real devices. Or there is a dedicated IO range that is not handled by RAM and can be used by devices?

There is a dedicated IO range that can be used by devices ... that is the sets of 32byte control register addresses for I/O in $9FFF, which three sets used by the system and five available on the expansion slots. Beyond that, DMA, "Direct Memory Access" would be the way to go, just take over the bus and write the data directly to the desired RAM location. At the upper limit, where only one side of the transfer is on the CX16 bus, that can proceed at one byte per cycle, so a "binary page" of 256 bytes can be moves in 256 cycles, whereas a general purpose (zp),Y copy is:

COPY: LDY #0
COPY1: LDA (SRC),Y : STA (DEST),Y : INY : BNE COPY1
COPY2: RTS

Where if that is page-aligned, that is 15 cycles per byte moved plus overhead.

Now, a page at a time is a lot of time for interrupts to be suspended, so a more general purpose DMA board might be oriented around 16byte, 32byte or 64byte chunks, depending on how long you figure you can leave interrupts suspended without messing with performance. If the DMA chunk reference autoincrements and the setting of the lower byte of the main bus target or destination chunk register triggers the DMA, you might have a loop of:

PASTE: ; A=chunk lo, Y=target-hi, as chunk reference, X = #chunks, 0-base, DMA source address is already set-up
PASTE1: STY dmatrg+1 : STA dmatrg : DEX : BEQ PASTE3
PASTE2: INC dmatrg : DEX : BNE PASTE2
PASTE3: RTS

That's an inner loop of 1.2-1.7 clocks per byte moved, plus overhead, for chunks of 16-64 bytes. Ideally the auto-increment system for the DMA RAM would be the same auto-increment system as for Vera, both to reduce the amount of new things that need to be learned, and also to integrate with Vera.

For Vera, and conceivably also for other I/O, you'd also have a fixed IO page target address function:

VPASTE: ; A=IOaddr, X = #chunks, 0-base, I/O target settings, DMA source address is already set-up
VPASTE1: STA dmaiotrg : DEX : BNE PASTE1
VPASTE2: RTS

 

Edited by BruceMcF
  • Like 1
Link to comment
Share on other sites

  • 3 weeks later...
30 minutes ago, Michael Kaiser said:

Ok.  So copy zero page and stack to bank for task 1 and store registers,  copy bank and zero page from bank for task 2 and retrieve registers.  Takes about 1.5 microseconds at 8Mhz.  That might actually be usable.

And depending on the task, may not need to copy all. Move the stack down to the bottom quarter of the stack page, allocate a 64byte section of zero page that you reserve, you only need to copy two chunks of 64 bytes up, two chunks of 64 byte down.

Link to comment
Share on other sites

  • 1 month later...

Just thinking about this, I think I might have come up with a method for swapping out the zero page that's even faster than just getting the 65C02 to copy it at 8 MHz—store the other zero page banks in VRAM, and make use of the VERA's autoincrementing ports. I'm not sure of the exact cycle counts, but I would imagine you would save some cycles by only having to do one indexed access instead of two.

On 4/30/2021 at 12:02 AM, kelli217 said:

Thought was seriously given to using a processor that supports the 65C02 instruction set but also already has relocatable stack and direct pages. However, there was still a problem; this processor has part of the address bus and data bus sharing the same lines via multiplexing, and the external demux logic was determined to be too much to deal with.

And that processor is the 65C816.

I thought the 65C816 was still a consideration (just not making use of its banking features), but they weren't going to test it until the design is otherwise finalized, which seems like an oddly suspenseful way to do it compared to just occasionally checking if the 65C816 works in the current prototypes.

  • Like 1
Link to comment
Share on other sites

10 hours ago, Serentty said:

Just thinking about this, I think I might have come up with a method for swapping out the zero page that's even faster than just getting the 65C02 to copy it at 8 MHz—store the other zero page banks in VRAM, and make use of the VERA's autoincrementing ports. I'm not sure of the exact cycle counts, but I would imagine you would save some cycles by only having to do one indexed access instead of two.

I thought the 65C816 was still a consideration (just not making use of its banking features), but they weren't going to test it until the design is otherwise finalized, which seems like an oddly suspenseful way to do it compared to just occasionally checking if the 65C816 works in the current prototypes.

Since the bus timings are tight, it really would be premature to check until they have a board working with the hardware they are going to use at 8MHz.

But I also would not be surprised if that eventually gets pushed out to "if you want that, get a bus mastering 65816 expansion card".

Link to comment
Share on other sites

Quote

But I also would not be surprised if that eventually gets pushed out to "if you want that, get a bus mastering 65816 expansion card".

Well, as long as they give it a chance, I'll be satisfied. The 65816 used to be a huge deal for me given the better high level language support, but the situation with the 65C02 has gotten better in only a few months, so I won't die if they don't use the 65816.

Link to comment
Share on other sites

On 6/28/2021 at 6:18 AM, BruceMcF said:

Since the bus timings are tight, it really would be premature to check until they have a board working with the hardware they are going to use at 8MHz.

But I also would not be surprised if that eventually gets pushed out to "if you want that, get a bus mastering 65816 expansion card".

Thought I’d seen a plug-in replacement board somewhere, bit like the old 65c802 which I think isn’t available, which would give many advantages outside the flat memory 

Link to comment
Share on other sites

6 hours ago, paulscottrobson said:

Thought I’d seen a plug-in replacement board somewhere, bit like the old 65c802 which I think isn’t available, which would give many advantages outside the flat memory 

Yeah, that would work for the CX16p ... but they haven't specified if the CX16c will have a socketed CPU, so across-model compatibility is at present uncertain. An expansion board would work with the CX16c and, if it has an expansion slot or block header to add one, the CX16e as well. Also, if it has it's own dedicated High RAM, it could have a mode to work as a coprocessor as well as a bus mastering mode to work as the main CPU.

Link to comment
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

 Share

×
×
  • Create New...

Important Information

Please review our Terms of Use