Jump to content

Recommended Posts

Has any consideration been made to making the stack ($0100 - $01FF) pageable?  Maybe zero page ($0002 - $00FF) as well?  This would greatly assist in multitasking.   

Share this post


Link to post
Share on other sites

The 65(c)02 cpu zero page and stack are hardwired to the first two memory pages.

I don't think you can work around that using some sort of external logic either.

Share this post


Link to post
Share on other sites

You certainly *can* work around it using external logic. A normal access to zero page just has a high address byte of 0, so if one inserted some logic between the CPU address lines and the rest of the system, it could "easily" remap the value 0 to any other value, thus relocating zero page. The same could be true for the stack page.

I think they are much too far down the path, and increasing the complexity of address decoding would increase the cost of the system. So time and money are both reasons why I suspect it would not happen.

Share this post


Link to post
Share on other sites

Note: The Commodore 128 had the functionality. Here is a video that talks about leveraging it (though I've not watched it).

I do remember seeing an example program on the 128 that used relocation of the stack (I think it was) to rapidly clear the 40 col screen by relocating it, pushing the accumulator 256 times, and repeating four times.

 

Share this post


Link to post
Share on other sites

Note: if you are anything like me, you'll be yelling at that video as he tries to work out the logic in real time to manage manipulating the various pages.

Share this post


Link to post
Share on other sites

No, there will not be paging of the stack or Zero Page. Lorin has been pretty firm that the memory management is finalized, aside from the parts of VERA and the expansion ports (Which are still in flux.)

 

Share this post


Link to post
Share on other sites

Here's how to run 2 processes on a Commander X16:

Buy 2 of them.  😄

 

(But seriously, 8 bit computers are kind of single task machines aren't they?)

Edited by x16tial
  • Haha 1

Share this post


Link to post
Share on other sites
41 minutes ago, x16tial said:

Here's how to run 2 processes on a Commander X16:

Buy 2 of them.  😄

 

(But seriously, 8 bit computers are kind of single task machines aren't they?)

They can multitask just as fast as they can do anything else (which is to say, not very fast by modern standards). You can split the stack into multiple smaller segments so that only the stack pointer has to be adjusted on a task switch, or a task switch can swap the stack & some / all of zero page. You still won't have memory protection or other features provided by modern systems, but it can be done.

Share this post


Link to post
Share on other sites

I've seen a multitasking Unix-like OS for the Commodore 64 - GeckOS I think it was called. Each task uses its own stack starting at 013F, so at most 64 bytes get swapped in there when changing tasks.

Share this post


Link to post
Share on other sites

Thought was seriously given to using a processor that supports the 65C02 instruction set but also already has relocatable stack and direct pages. However, there was still a problem; this processor has part of the address bus and data bus sharing the same lines via multiplexing, and the external demux logic was determined to be too much to deal with.

And that processor is the 65C816.

Share this post


Link to post
Share on other sites
2 hours ago, Scott Robison said:

They can multitask just as fast as they can do anything else (which is to say, not very fast by modern standards).

Right, which is basically what I was implying.  It *can* be done, but... why?  Part of the charm (imo) is the single task nature of the machine.

Share this post


Link to post
Share on other sites
10 minutes ago, x16tial said:

Right, which is basically what I was implying.  It *can* be done, but... why?  Part of the charm (imo) is the single task nature of the machine.

Agreed. I wasn't sure if you meant "physically cannot" or "logically why would you". Sorry to be pedantic. 🙂

Share this post


Link to post
Share on other sites

It's not only about multitasking in sense of typical multiprocessing OS. What about something like complex video-music handlers? There are portions of code that have to be executed periodically. Like a complex subroutine that can switch the context totally without need to put data in/out of context. Maybe even not the whole zero page but something like block of so called registers - 256 of them multiplied by 16 gives us 4096 bytes of easily accessible memory. And probably the same with stack, 16 instances of 256 byte stack. 8K of RAM in total.

Treat it not as a request but rather as a theoretical question - how would you use such an addition?

Share this post


Link to post
Share on other sites
11 hours ago, Roman K said:

It's not only about multitasking in sense of typical multiprocessing OS. What about something like complex video-music handlers? There are portions of code that have to be executed periodically. Like a complex subroutine that can switch the context totally without need to put data in/out of context. Maybe even not the whole zero page but something like block of so called registers - 256 of them multiplied by 16 gives us 4096 bytes of easily accessible memory. And probably the same with stack, 16 instances of 256 byte stack. 8K of RAM in total.

Treat it not as a request but rather as a theoretical question - how would you use such an addition?

I mean, we already have banked RAM that can do all of this "putting data in/out of context" thing you're referring to, except in whole blocks of 8K instead of your subset of registers in the ZP. And if I'm working with heavy A/V data, I care about the quantity way more than the exact cycle count of accessing it, because all the cycle count savings in the world won't help if I have to slurp in new data from external storage - we're talking about multiple orders of magnitude difference to copy from memory versus copying from the SD card. And if I'm stuck with a loading screen, I want to be stuck only and exactly once, if possible.

And honestly, if I'm in an A/V situation where I desperately need to maximize the bandwidth to, say, the VERA -- so much so that I can't afford the access penalty of banked himem -- then I'm going to be highly interested in an expansion card that provides hardware DMA. In fact, maybe a card that interfaces with its own SD card and caches the raw binary image of the card (or, ideally, a specified file in a filesystem on the card, maybe through some global launcher that comes with the card, but I'm happy to live within less sophisticated features), and that can then be instructed to use DMA to push selected regions of its local storage to a single address (piping it to the VERA or another device), to an address range (dumping it to RAM), or even to a series of RAM banks. Or hell, even if the card simply provided DMA to copy things between memory ranges and devices on the X16, with no additional memory, that would still be faster than any software copy by an order of magnitude, if it really came down to that.

But then, there's already a lot about the X16 that wouldn't have been remotely possible on the C64, just because the X16 is 8x the clock speed. Pushing it further seems likely to run into more basic problems involving the finite amount of memory on the system, requiring different hardware solutions to expand the available memory so that the increased bandwidth is... well, useful. About the only special purpose where I'd even think I'd want DMA is game programming, to quickly transfer large quantities of assets to the VERA without having to sit on a black screen while 128K or whatever is spooled.

Really, multitasking seems like the only significant purpose served by swapping the ZP and stack, and I have a whole essay I could go into about that (tl;dr, I'm not a fan and think you really want an entirely different CPU family and system architecture if you really want to look into multitasking as anything more than an extremely fragile parlour trick).

Edited by StephenHorn
typo
  • Like 3

Share this post


Link to post
Share on other sites

Thank you for such a detailed answer!

Is it possible to make an expansion card with onboarding DMA controller and current schematics?

Share this post


Link to post
Share on other sites

Yes. Lorin Millsap posted a primer for designing such hardware. The expansion slots give access to the full system bus and the RDY, bus enable, and IRQ lines. That’s enough to completely shut down the CPU and drive the system from an expansion port.

I think the most immediate need / benefit for DMA is for PCM audio playback. At 16-bit 48K stereo quality, the DSP burns through a 4K buffer extremely quickly. (if I did my math right, it’s 1/48 of a second), so basically a little more than once per frame, you must blit 4K into the DSP.

 

  • Like 1

Share this post


Link to post
Share on other sites
27 minutes ago, ZeroByte said:

Yes. Lorin Millsap posted a primer for designing such hardware. The expansion slots give access to the full system bus and the RDY, bus enable, and IRQ lines. That’s enough to completely shut down the CPU and drive the system from an expansion port.

I think the most immediate need / benefit for DMA is for PCM audio playback. At 16-bit 48K stereo quality, the DSP burns through a 4K buffer extremely quickly. (if I did my math right, it’s 1/48 of a second), so basically a little more than once per frame, you must blit 4K into the DSP.

 

Yes, if that math is right, you need to pump in 3.2KiB every 1/60th of a second.

Share this post


Link to post
Share on other sites

The reason I asked is because the C128 could do it.   I had a C128 30+ years ago and did not have the skill at the time to write a task switching kernel.  I was hopeful that the X16 would be able to do this and had hoped to write a task switching kernel that could do this.  And yes, I saw both GeckOS and Contiki and their ability to switch tasks by subdividing the stack.   I just found that limiting.  I may still do this by subdividing the stack.

Share this post


Link to post
Share on other sites
36 minutes ago, Michael Kaiser said:

The reason I asked is because the C128 could do it.   I had a C128 30+ years ago and did not have the skill at the time to write a task switching kernel.  I was hopeful that the X16 would be able to do this and had hoped to write a task switching kernel that could do this.  And yes, I saw both GeckOS and Contiki and their ability to switch tasks by subdividing the stack.   I just found that limiting.  I may still do this by subdividing the stack.

You are correct, not having the option is more limiting than having the option. But the option won't be there, so manual copying of the stack / zero page will be required.

I kind of had to do something similar (though much less complex) for my BASIC PREPROCESSOR so that I could recursively call into BASIC to use the crunch routine from assembly. In my case when I SYS $0401 to my routine, I save all the zero page & $200-$3FF values that BASIC depends on, JSR into BASIC (which is already running my BASIC program), then restore them. In my case it is only I think about six bytes or so.

Share this post


Link to post
Share on other sites

Is it possible to implement additional mapping via the extension card? Is there any documentation on how banking implemented at all? And how can memory values be overridden. Like device listens to some memory addresses (IO to memory mapping as I understand) and provides its own memory instead. That's the way it works for devices, but I'm just curious - what happens with original memory then?

Share this post


Link to post
Share on other sites

I wouldn't think so - The thing about DMA is that either the DMA device is driving, or the CPU is driving. I don't think there's any way for an expansion slot to take over the bus and override the glue logic, which is what would need to happen for an expansion device to supplant a system peripheral. A device in the expansion slot can listen to the entire address range - e.g. a debugger interface card. But I don't think it's going to have the power to say, "No no. I don't think VERA should respond to $9f24 - that's me, and I'm going to do X-Y-Z instead." The glue logic is going to be sending CE signals to whatever chips are supposed to receive them for any given address/range. E.g. if the bus has $9f40, then the glue logic will be sending CE to the YM2151, regardless of whether that was asserted by the CPU or a DMA device.

Share this post


Link to post
Share on other sites

@ZeroByte I mean overriding the memory, not another device. Like my device with RAM on it listens to the address bus and responds to the CPU if some memory address is requested. If there is memory chip already responsible for that address, how is that conflict resolved? That should happen on real devices. Or there is a dedicated IO range that is not handled by RAM and can be used by devices?

Share this post


Link to post
Share on other sites

If you want to override memory, the best you could do is to monitor writes and then do a DMA and come in behind the CPU and overwrite an address with something else. The glue logic is going to activate the RAM IC for address X whenever that address is asserted on the bus. You can't make a device which assumes the role of the bus devices and supplant the glue logic - remember that even RAM is just a device on the bus.

  • Like 1

Share this post


Link to post
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.


×
×
  • Create New...

Important Information

Please review our Terms of Use