Jump to content

Reconsidering the 65816 (W65C816S)


rje
 Share

Recommended Posts

We've seen the videos.

We know it would be like starting over, because the system only works at 8Mhz.

Paul Scott Robson:

Quote

About as much chance as it being personally delivered by Luke Skywalker I think. Timing is apparently an "issue"

Kevin Williams:

Quote

Even though the system is designed to be a 65C02 based machine, I designed it such that a 65C816 will work electrically in the board.  The KERNAL isn't a fan of the 816, so it would have to be running a different OS, but we wanted to make sure people had the option to do what they wanted with the system.

Stephen Horn:

Quote

the '816 either has a conflicting opcode with the 65C02 that the kernel is using, or is expecting some new piece of memory to be mapped in a way that conflicts with what the kernel is doing, and the kernal is crashing or entering an infinite loop before it gets to VERA initialization.

Various people:

Quote

I can't muster even a little nostalgia for [the 65816]

 

But.

 

 

Paul's argument haunts me PROBABLY ONLY because I am not a good assembly language programmer:

With a faster processor, I could write a reasonably performant P-Code interpreter.  We could write Robotron (et al) in C, or a kind of Commodore-friendly STOS BASIC.

And keep everything else we've got.  (Theoretically)

 

 

What Paul is saying is EASE OF USE goes up by an order of magnitude, IF processor capability improves.

 

Edited by rje
  • Like 1
Link to comment
Share on other sites

Posted (edited)

But it's not about Ease of Use, because this is what I expect would happen:

Tom:

Quote

It only takes one extra chip to make it work as intended, and we wouldn't have the silly 4K banks. Instead, we'd have up to 16MB of RAM in 64K banks.

Capability is expanded into a full 16 bit system, and this probably ceases to have that particularly Commodore simple-retro draw.

 

Edited by rje
  • Like 1
Link to comment
Share on other sites

So in summary: if I want something that feels like a Commodore 8-bit machine, then I should be content with the 6502, and be happy that there's still a gigantic pile of RAM that I never had on the C64.

 

I do wish sound and sprites were "easier to use", but that's a different thread to create.

 

Link to comment
Share on other sites

On 7/24/2021 at 1:29 AM, rje said:

But it's not about Ease of Use, because this is what I expect would happen:

Tom: "It only takes one extra chip to make it work as intended, and we wouldn't have the silly 4K banks. Instead, we'd have up to 16MB of RAM in 64K banks."

Capability is expanded into a full 16 bit system, and this probably ceases to have that particularly Commodore simple-retro draw.

But popping a 65816 into the CPU socket won't add the "one extra chip", because it has to be in the motherboard. Popping a 65816 into the CPU socket in effect gives you a 65802 with slight bus incompatibilities (SYNC is replaced by VDA/VPA and IIRC the clock outputs are DNC and a reset input).

And a bus mastering 65816 card would be pretty much the same thing, though more room for circuitry to bridge the bus incompatibilities ... including masking out the bank address from the data lines. I would be entirely unsurprised if the problem in writing VERA is the bank on the data bus followed by the data confusing Vera in a way that is not an issue with the 65C02 write cycle. Nor would I be surprised if the bus cycles for some of the chips "work" with the 6502 but just make it, and small variations in actual read or write delays associated with the transition between bank mode and data mode make the timing too tight to work.

But if you can get a bus mastering card to work, you get the pcode interpreter with the accumulator in 8bit mode and indexes in 16bit mode, with ops ending with JMP NEXTOP or an eight byte NEXTOP macro:
NEXTOP: INY : LDA 0,Y : TAX : JMP (OP1,X)

The thing is: since it can run 6502 assembly code, can run the same pcode as the 6502, except faster, and can host a compatible ROMBASIC interpreter, except faster, and since assembly code can test whether it is running on a 6502 or 65816, it might actually work as a 3rd party enhancement.

Then it would only breaks running CX16 code if people use any of the four individual-bit-addressed operations in their assembled code, so it just needs "enough" of an install base so people shy away from doing that.
 

Edited by BruceMcF
  • Like 2
  • Thanks 1
Link to comment
Share on other sites

A Commodore friendly STOS is called AMOS. 😁

Perhaps those that don't like the choices being made for the X16 should look in to building there own 'dream' computer? Of course everyone else would be unhappy with >>>your<<< choices too, and ask why you can't build it differently? So there would be lots of incompatible computers... just like in the 1980s. 😉

  • Like 2
Link to comment
Share on other sites

14 hours ago, Gromit337 said:

A Commodore friendly STOS is called AMOS. 😁

Perhaps those that don't like the choices being made for the X16 should look in to building there own 'dream' computer? Of course everyone else would be unhappy with >>>your<<< choices too, and ask why you can't build it differently? So there would be lots of incompatible computers... just like in the 1980s. 😉

Already done it. Pondering a RiscV based design, a sort of modern Archimedes

Link to comment
Share on other sites

16 hours ago, Gromit337 said:

A Commodore friendly STOS is called AMOS. 😁

Perhaps those that don't like the choices being made for the X16 should look in to building there own 'dream' computer? Of course everyone else would be unhappy with >>>your<<< choices too, and ask why you can't build it differently? So there would be lots of incompatible computers... just like in the 1980s. 😉

Great for hardware folks. Not so great for us software folks. And, of course, the odds for most of us pulling together Dave's retro Dream Team are quite low.

But luckily, the degree of flexibility to swap in a 65816 IS part of the design choices being made for the X16. The open question is whether it can be done with a CPU swap, or whether a bus mastering expansion board is the only workable option. But the board is being built to be electrically compatible with a 65816, so if there is a issue putting it in the CPU socket, it's a more subtle problem.

Link to comment
Share on other sites

1 hour ago, BruceMcF said:

Great for hardware folks. Not so great for us software folks.

Well, we can always build an emulator for our non-existent dream computer. It wouldn't be much different from where we are right now with the X16 😛.

  • Haha 1
Link to comment
Share on other sites

1 hour ago, Guybrush said:

Well, we can always build an emulator for our non-existent dream computer. It wouldn't be much different from where we are right now with the X16 😛.

You could also learn some HDL and invest in a Tesirac DE-10 Nano, possibly a few hats and a case, and try to implement it as an FPGA softcore.  Then, once the basic chipset is solidified, you can progress to the necessary OS, BIOS, Language, and Driver code, and then whatever peripheral and interface ports you want to add via USB and dongles.  Once you get all that solidified, you've taken roughly 75% of the guesswork out of designing a motherboard. Then, assuming you've stuck to (relatively) available chips in your softcore, you can either send the Gerber file you've confident enough with to PCB Way, or raise money and invest in a Nano-Dimension Dragonfly PCB 3D Printer, and begin assembling and testing prototype boards...

Link to comment
Share on other sites

18 hours ago, Guybrush said:

Well, we can always build an emulator for our non-existent dream computer. It wouldn't be much different from where we are right now with the X16 😛.

The difference is the number of people developing for the CX16, and that suggests that the existence of prototype boards and the strong likelihood of an actual computer actually is a significant difference for this kind of thing.

  • Like 1
Link to comment
Share on other sites

The marriage to backwards compatibility is directly at odds with current day Personal Computer Usage.

I love the idea of maybe getting something put into a socket or expansion port - possibly with a daughterboard because that's retro as hell - but it's more important to make the computer work to the needs of its intended users, and...well...work at its base specifications first. Good Design will carry it forward after that. There's a reason there were so many new computers back then, and I think discussions like these really demonstrate it because we have such higher demands on our stuff now.

Honestly - Why wouldn't I just get an Amiga hardware clone project, instead? (Or a raspberry pi with everything including the disk drive sounds) . Also - it's 2021 - I am not messing with Kickstart and Workbench the same way I did in the 90's. It was bad enough I didn't know if my disks worked or not - but my Amiga became unusable when my Kickstart 1.3 disk died. As we go forward over the next 40 years of PC history, we can see how standards, design, and completeness became more important for every new computer system right into the x86 and beyond.

Edited by Starsickle
  • Like 1
Link to comment
Share on other sites

1 hour ago, Starsickle said:

... I love the idea of maybe getting something put into a socket or expansion port - possibly with a daughterboard because that's retro as hell - but it's more important to make the computer work to the needs of its intended users, and...well...work at its base specifications first. ...

This seems like a non sequitur, since there's is nobody suggesting anything that would involve putting a 65816 option ahead of "working at its base specifications", and the point about the 65816 is that if it's available, it would make the CX16 better meet the needs of its intended users.

  • Like 1
Link to comment
Share on other sites

Posted (edited)
17 hours ago, BruceMcF said:

... the point about the 65816 is that if it's available, it would make the CX16 better meet the needs of its intended users.

 

Right.  The way I look at it, it would be nice if it could simply replace the 65C02, with no other changes.  And I figure that was the original thought as well -- "can it just be plugged in and used like a super-powered 6502?"  And the answer (based on video #2) was: probably not without annoying and/or expensive extra work.  Since the system had to work first, that and other options were scrapped.  The decision makes sense.  It would be nice if the current board, being stable, allowed some experimentation, but meh.

 

Edited by rje
Link to comment
Share on other sites

53 minutes ago, rje said:

Right.  The way I look at it, it would be nice if it could simply replace the 65C02, with no other changes.  And I figure that was the original thought as well -- "can it just be plugged in and used like a super-powered 6502?"  And the answer (based on video #2) was: probably not without annoying and/or expensive extra work.  Since the system had to work first, that and other options were scrapped.  The decision makes sense.  It would be nice if the current board, being stable, allowed some experimentation, but meh.

Technically, "not without finding out what the issue is with the Vera startup process". Since they are taking advantage of the early assert of the 6502 address lines, it is entirely possible that the two bus timings comply with the same read delays and write holds specs, but the actual timing is not the same, so the 6502 just slips in under the bar while the 65816 misses. For instance, Vera is faster than the system bus, and is internally not synchronous with the system bus, it may be that the Vera sometimes actually performs its read early enough in the cycle when the 6502 is actually in a settled state, irrespective of what the timing diagram says, and the 65816 data bus is an an ambiguous state between the bank assert and the data asset.

And it could be use of Rockwell opcodes which can easily be replaced at the cost of a handful of bytes and a handful of clock cycles.

After their long delay getting the Proto#2 board to boot up, they can quite reasonably conclude that they don't have the time to hammer that out. If it can be fixed up later so that an end user can drop in a 65816 and just go, well, so be it, but leave that until the original is out in the wild and more people who are more interested in it have a go at the problem.

ESPECIALLY since they are explicitly setting up the slots so that a board that is willing to go to the extra work required can take over as the bus master, so a 65816 bus master expansion card is always an option. That expansion card can play clock cycle games that the drop in replacement cannot play ... for instance, with a 25MHz frequency source and a FPGA with a 2x PLL, it can generate an asymmetric 8MHz clock cycle for the 65816, with a shorter PHI2=1 cycle and a longer PHI2=0 cycle, if the problem is data not being asserted soon enough in the motherboard system clock cycle.

Edit: Note that if using an FPGA as sketched above fixed the problem, it might be a single main chip board ... other than clock module, voltage translation and some resisters and caps ... since WDC also licenses a soft core of the 65816.

Edited by BruceMcF
  • Like 1
Link to comment
Share on other sites

14 minutes ago, BruceMcF said:

After their long delay getting the Proto#2 board to boot up, they can quite reasonably conclude that they don't have the time to hammer that out. If it can be fixed up later so that an end user can drop in a 65816 and just go, well, so be it, but leave that until the original is out in the wild and more people who are more interested in it have a go at the problem.

ESPECIALLY since they are explicitly setting up the slots so that a board that is willing to go to the extra work required can take over as the bus master, so a 65816 bus master expansion card is always an option. That expansion card can play clock cycle games that the drop in replacement cannot play ... for instance, with a 25MHz frequency source and a FPGA with a 2x PLL, it can generate an asymmetric 8MHz clock cycle for the 65816, with a shorter PHI2=1 cycle and a longer PHI2=0 cycle, if the problem is data not being asserted soon enough in the motherboard system clock cycle.

Both good points.  Thank you Bruce.  

 

I started thinking of the '16 again because of theoretical delays due to current chip shortages.  Granted, I suspect that's not on the critical path.

  • Like 1
Link to comment
Share on other sites

1 minute ago, rje said:

Both good points.  Thank you Bruce. 

I started thinking of the '16 again because of theoretical delays due to current chip shortages.  Granted, I suspect that's not on the critical path.

I'd think both designs can use a quite old process, and likely the same process (since WDC is a fabless design studio), and there will be available capacity in that process to produce a new batch if present stockpiles run low. It's rather the FPGA where I would not be surprised if there are logistic difficulties with some specific members of some specific families ... I know that the Foenix256 design was redone because she was having difficulty getting the specific FPGAs that she had originally designed around, and she felt that transitioning to a new family would ease the supply constraint.

Link to comment
Share on other sites

  • 2 weeks later...

Anyone know the deep technical explanation for why the 65816 was scrapped? I saw David's 2nd building the ultimate 8 bit computer video, but he didn't have the time to go into it in depth.

I was one of those in the 6502 camp at the beginning and after reading through the arguments I decided to buy parts and build a Ben Eater style solderless breadboard computer with a 65816 to see what was involved. The 65C816 and 65C02 seem to have a high degree of timing compatibility so it really came down to adding a high speed latch, bus transceiver, and inverter. The reference design for demuxing BA0-7 and D0-7 are included in the 65816 datasheet (you have to read it carefully though, I wasted a lot of time on my first implementation because I did not). I also needed to modify the low memory address decoder so it was not used when not using bank address 0 but would assert the correct chip enable in 24 bit mode. I wanted to keep to the all DIP design style and needed to map 18 address bits (A4-A21), plus an A1-A3 OR'd signal (to detect access to $00 and $01) to chip enables for 5x SRAM, 8x IO select, 1x ROM, and 4x ZP RAM/ROM bank flipflops. A total of 19 inputs and 18 outputs. This wasn't going to practical using discrete 7400 series DIP parts so I ended up using 2x ATF22V10C SPLDs to handle all address decoding; it was pretty similar to the C64 PLA replacements that use 2 GAL parts (e.g. "PLA20V8" which uses 2x Lattice GAL20V8B parts). This also allowed me to scrap several 7400 series chips that previously did the address decode. Altogether I think the count was 4 chips and 3 diodes replacing maybe 6 or 7 discrete logic chips (I don't recall, it was a whole season ago) to enable functionality for something that might be CX16 compatible (hard to tell without a VERA card) and can seamlessly transition between 6502 emulation mode and full 65816.

Link to comment
Share on other sites

3 hours ago, Wavicle said:

Anyone know the deep technical explanation for why the 65816 was scrapped? I saw David's 2nd building the ultimate 8 bit computer video, but he didn't have the time to go into it in depth.

However, what he said there was why ... using the 65816 in 24bit address mode was using a CPLD to dereference the address, while going to the 64K address map with banking allowed the chip select to be done with glue logic. And the 64K address map could be done in the VIC-20 style that Dave preferred, where every part of the memory map only does one thing.

Of course, we don't get a blow-by-blow account of development, especially false starts and dead ends, so we don't know how much of the description of development hell in the second video was just general experience and how much was experience with the first generation of the design.

However, once you have a 64K address map, you can build the board with a 65C02 to have one less point of difference between the original Commodore Kernel and BASIC ROM code that was the starting point, so you have one fewer problem areas in trying to get the board to boot up at full 8MHz speed. Then if the board can boot with the 65C02, it's possible to see if it will boot with the 65C816.

At this point, it doesn't boot with the 65C716, seemingly because of some problem in the Vera initialization, but the board has been designed to be electrically compatible with putting a 65C816 in the 65C02 socket.

Edited by BruceMcF
Link to comment
Share on other sites

15 hours ago, Wavicle said:

Anyone know the deep technical explanation for why the 65816 was scrapped? I saw David's 2nd building the ultimate 8 bit computer video, but he didn't have the time to go into it in depth.

There are availability arguments about the 65816, it's not as popular as the 65C02 and it could go out of usage.

The design is basically the 6502 Reference design with Vera wedged on with kludges to try to make the timing work.   There was a desire to use Commodore Basic, why I don't know because it's absolutely dire, and if anyone wants to use it to teach programming to kids verging on criminal.

The one thing that has always worked is Vera which is on an FPGA, it's been revamped slightly but it always worked properly.

You might as well reduce the abilities of Vera slightly (or, say, be able to map the RAM onto a 6502 address space) and then put the CPU in the FPGA - there are umpteen 6502 cores available - and they run way quicker. You'd get a much better machine. 

RJE wrote about doing a P-System interpreter and I know Bruce was looking at the 6502 Pascal Compilers that are out there. If the 6502 was 5-10 times faster than it is now, say the speed of the Mega65's core, the problems about speed, data sizes and address space simply go away. 

Isn't going to happen though.

  • Haha 1
Link to comment
Share on other sites

4 hours ago, paulscottrobson said:

RJE wrote about doing a P-System interpreter and I know Bruce was looking at the 6502 Pascal Compilers that are out there. If the 6502 was 5-10 times faster than it is now, say the speed of the Mega65's core, the problems about speed, data sizes and address space simply go away. 

And there's the "solution" -- if I want an interpreted HLL badly enough, I'll do it on the Mega65.

 

Link to comment
Share on other sites

18 hours ago, BruceMcF said:

However, what he said there was why ... using the 65816 in 24bit address mode was using a CPLD to dereference the address, while going to the 64K address map with banking allowed the chip select to be done with glue logic. And the 64K address map could be done in the VIC-20 style that Dave preferred, where every part of the memory map only does one thing.

Of course, we don't get a blow-by-blow account of development, especially false starts and dead ends, so we don't know how much of the description of development hell in the second video was just general experience and how much was experience with the first generation of the design.

However, once you have a 64K address map, you can build the board with a 65C02 to have one less point of difference between the original Commodore Kernel and BASIC ROM code that was the starting point, so you have one fewer problem areas in trying to get the board to boot up at full 8MHz speed. Then if the board can boot with the 65C02, it's possible to see if it will boot with the 65C816.

At this point, it doesn't boot with the 65C716, seemingly because of some problem in the Vera initialization, but the board has been designed to be electrically compatible with putting a 65C816 in the 65C02 socket.

I'm not sure that's what he said; he said it "requires a lot of external circuitry to decode this and split it out" (referring to demuxing the signals), but demultiplexing the top address bits from the data bits requires two 7400 series chips, and an inverter which might require one additional chip (no additional chip if you have an extra NAND gate or inverter lying around). This is my demux circuit on the breadboard:image.thumb.png.1a117cf58513605288f636abc0ca970c.png

The top component is a latch and the bottom is a bus transceiver (the clock and RW lines are not here because I'm in the middle of a layout change). The green wires are address bus, blue wires data. You can see the data lines go from the CPU to the latch and the three wires on the right of the latch are BA0-BA2 (or A16-A18 if you prefer). If the concern is with the low 16 addresses going to the chip select circuit when BA0-BA7 != 0, you can fix that with another 74x245 on A8-A15 and a wired OR of BA0-BA7 connected to OE#.

A GAL/SPLD (I don't think this would need a CPLD) certainly decreases the component count and wiring complexity, but even with 74xx logic chips, I don't think it's going to increase the complexity much above the ten or so 74xx parts already on the board for bank and chip select generation. Hence why I asked the question.

  • Like 1
Link to comment
Share on other sites

6 hours ago, paulscottrobson said:

There are availability arguments about the 65816, it's not as popular as the 65C02 and it could go out of usage.

 

Has WDC given hints about this? My understanding is that most of their money comes from licensing the design and I suspect that their production of these parts exists primarily so that customers can prototype designs with an inexpensive part before committing to a core license.

To be fair, if I'm reading the date codes correctly, my brand new W65C816S purchased from Mouser in April was manufactured in 2010 while my brand new W65C02S purchased at the same time was manufactured in 2019. That would seem to suggest that 6502s move significantly more volume.

  • Like 1
Link to comment
Share on other sites

1 hour ago, Wavicle said:

A GAL/SPLD (I don't think this would need a CPLD) certainly decreases the component count and wiring complexity, but even with 74xx logic chips, I don't think it's going to increase the complexity much above the ten or so 74xx parts already on the board for bank and chip select generation. Hence why I asked the question.

The design of the board he describes seems to be using a CPLD or FPGA for that, at least by the time that "memory map version 1" was dropped for the simpler "memory map version 2".

Of course, at the time, as at present, they were not talking about what was going on in detail, but they were at least referring to issues with bus timing, and that would as easily be a timing issue rather than a complexity issue ... the in production, through pin static RAM they are using has pretty tight timing for chip select partway through an 8MHz clock cycle, and pulling the chip select logic into a single PLD speeds up the timing of the assert. To put the chip selects and A16-A23 out through a single PLD requires more than the 10 output pins available on a 22V10, hence a CPLD or as part of the functions on an FPGA.

(In the Feonix256, that's done through a bus master FPGA, which AFAIR in the latest design has been integrated with the audio master FPGA.)

Link to comment
Share on other sites

1 hour ago, BruceMcF said:

The design of the board he describes seems to be using a CPLD or FPGA for that, at least by the time that "memory map version 1" was dropped for the simpler "memory map version 2".

Of course, at the time, as at present, they were not talking about what was going on in detail, but they were at least referring to issues with bus timing, and that would as easily be a timing issue rather than a complexity issue ... the in production, through pin static RAM they are using has pretty tight timing for chip select partway through an 8MHz clock cycle, and pulling the chip select logic into a single PLD speeds up the timing of the assert. To put the chip selects and A16-A23 out through a single PLD requires more than the 10 output pins available on a 22V10, hence a CPLD or as part of the functions on an FPGA.

(In the Feonix256, that's done through a bus master FPGA, which AFAIR in the latest design has been integrated with the audio master FPGA.)

I think they are going to have a lot of difficulty pulling off 8MHz, even exploiting early address bus stabilization, if the parts shown in the videos are correct (xx74ACTxx logic, Alliance AS6C4008 SRAM). With tACC=70ns on the CPU and tACE=55ns on the SRAM, there may only be 15ns available from address stable until the correct CS# must be asserted. Those gates have a propagation delay up to 9.5ns, therefore if your output requires more than 1 level of combinatorial logic (which I think is guaranteed to be the case for memory below $9F00), you have the potential to hit a timing violation. Switching to faster xx74AHCTxx components will let you have two levels of logic if your load capacitance and temperature are sufficiently low (i.e. 15 pF and 25C or better). If I recall correctly, the problem with the v2 prototype board was that CS and PHI2 were effectively being ANDed together which should never work at 8MHz because you have less than 55ns from PHI2 until the end of tACC.

A16-A23 should not need to go through a PLD unless it is also demuxing the data bus, which is quite a bit more expensive than using a 74AHCT245 and only 1ns faster. In my experimental case, I only feed the high address bits into the SPLD and only the chip select signals come out.

Incidentally, the SPLD has a propagation delay as low as 7ns so you could go through two of them without a timing violation at 8Mhz on a 65C02. I'm not expert working with these old parts, but going strictly by the numbers, I just don't see how CX16 is going to be able to run stable at 8MHz unless the published timing values are all lies. If they do manage it, I am probably going to buy a sacrificial unit just to connect to the logic analyzer and figure out HOW. At 4MHz, all of this concern goes away and delay for demuxing the data bus is not going to be fatal to a 65C816.

Link to comment
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

 Share

×
×
  • Create New...

Important Information

Please review our Terms of Use