Jump to content

Reconsidering the 65816 (W65C816S)


rje
 Share

Recommended Posts

I think they are going to have a lot of difficulty pulling off 8MHz, even exploiting early address bus stabilization, if the parts shown in the videos are correct (xx74ACTxx logic, Alliance AS6C4008 SRAM). With tACC=70ns on the CPU and tACE=55ns on the SRAM, there may only be 15ns available from address stable until the correct CS# must be asserted. Those gates have a propagation delay up to 9.5ns, therefore if your output requires more than 1 level of combinatorial logic (which I think is guaranteed to be the case for memory below $9F00), you have the potential to hit a timing violation. Switching to faster xx74AHCTxx components will let you have two levels of logic if your load capacitance and temperature are sufficiently low (i.e. 15 pF and 25C or better). If I recall correctly, the problem with the v2 prototype board was that CS and PHI2 were effectively being ANDed together which should never work at 8MHz because you have less than 55ns from PHI2 until the end of tACC.

A16-A23 should not need to go through a PLD unless it is also demuxing the data bus, which is quite a bit more expensive than using a 74AHCT245 and only 1ns faster. In my experimental case, I only feed the high address bits into the SPLD and only the chip select signals come out.

Incidentally, the SPLD has a propagation delay as low as 7ns so you could go through two of them without a timing violation at 8Mhz on a 65C02. I'm not expert working with these old parts, but going strictly by the numbers, I just don't see how CX16 is going to be able to run stable at 8MHz unless the published timing values are all lies. If they do manage it, I am probably going to buy a sacrificial unit just to connect to the logic analyzer and figure out HOW. At 4MHz, all of this concern goes away and delay for demuxing the data bus is not going to be fatal to a 65C816.

HCT is not faster. ACT is way faster. However as a while you are right.

 

 

Sent from my iPhone using Tapatalk

Link to comment
Share on other sites

16 hours ago, Wavicle said:

...if I'm reading the date codes correctly, my brand new W65C816S purchased from Mouser in April was manufactured in 2010 while my brand new W65C02S purchased at the same time was manufactured in 2019. That would seem to suggest that 6502s move significantly more volume.

Wow!  Thank you for sharing; current popularity is a data point I hadn't considered before.

Link to comment
Share on other sites

17 hours ago, Wavicle said:

I'm not sure that's what he said; he said it "requires a lot of external circuitry to decode this and split it out" (referring to demuxing the signals), but demultiplexing the top address bits from the data bits requires two 7400 series chips, and an inverter which might require one additional chip (no additional chip if you have an extra NAND gate or inverter lying around). This is my demux circuit on the breadboard:image.thumb.png.1a117cf58513605288f636abc0ca970c.png

Wow, your breadboards are as clean as a Ben Eater video.  Mine are always so messy... but them I'm a rank amateur.

 

  • Like 1
Link to comment
Share on other sites

1 hour ago, rje said:

Wow, your breadboards are as clean as a Ben Eater video.  Mine are always so messy... but them I'm a rank amateur.

I know what you mean. I put together my first breadboard last weekend. Ben Eater's CPU clock kit. I think it turned out okay (except probably for wire color and consistency, but I'm color blind so ... there's that). But it isn't as neat because I mostly used pre-cut wires and there wasn't always a good length for certain runs. I did tweak it a little bit but I'm happy enough with it, and more importantly, happy I was able to troubleshoot everything without too much angst. 

cpu-clock-board.jpg

  • Like 1
Link to comment
Share on other sites

1 hour ago, rje said:

Wow, your breadboards are as clean as a Ben Eater video.  Mine are always so messy... but them I'm a rank amateur.

 

In one of Ben's videos he covered how he measures and cuts wires to make the breadboard design clean. I shamelessly copied his technique and never looked back. Since doing so, I find that I'm much more likely to return to a side project and tinker with it some more if it looks clean... it makes the project "inviting" if you will. I might use flexible jumpers to prove a concept, but once I'm happy, those are replaced with measured wires (e.g., the yellow wires in my board picture are clock signals and the yellow flexible jumper is connecting an external clock generator to the board; the clock wire to the CPU hadn't been cut yet, the wire going off to the left connects to the 6522 VIA).

I think the only improvement I've made that I don't recall hearing from Ben is that the distance from the top of one row of holes to the center channel of the breadboard is exactly 6 holes. Once I've measured the wire length, I lay the wire across a row and snip flush inside the channel. The wire is now exactly the correct length after stripping the ends. The other hint that I'm not sure I heard Ben mention is that for non-trivial designs, do not buy the cheap no-name solderless breadboards off of Amazon. The one I built this prototype on is a Digilent large solderless breadboard I purchased from Mouser (https://www.mouser.com/ProductDetail/424-340-002-1); it was 3x the cost of a set of similar boards sold under a dozen random names on Amazon but contains 1/10th the frustration.

  • Like 1
Link to comment
Share on other sites

13 hours ago, Wavicle said:

I think they are going to have a lot of difficulty pulling off 8MHz, even exploiting early address bus stabilization, if the parts shown in the videos are correct (xx74ACTxx logic, Alliance AS6C4008 SRAM). With tACC=70ns on the CPU and tACE=55ns on the SRAM, there may only be 15ns available from address stable until the correct CS# must be asserted.

When I was reading minimum tACC, it seemed like it would be determined by voltage, not clock speed, that would be 70ns at 3.3V-3.0V, it's 30ns at 5.0v. IOW, using the voltage heading rather than the clock speed heading for that one.

The clock pulse widths are clock speed determined, clock pulse width high and low are 62ns/63ns at 8MHz. At 5.0v the maximum address setup after the clock rises is 30ns, so there is 32ns until the clock falls if the RAM select and R/W are ORed with the clock, which would be the time for two level combinatorial logic to generate those selects. Then the propagation delay for the RAM chip and Read or Write select from the OR with PHI2 would also be approximately the propagation delay for the deselect on rise of the clock, so there's the roughly 60ns the RAM chip is selected.

But, famously, that is a 8080/Z80 style bus interface, and there are other 8bit bus interfaces that require a different sequence of their bus interface lines.

Link to comment
Share on other sites

21 minutes ago, BruceMcF said:

When I was reading minimum tACC, it seemed like it would be determined by voltage, not clock speed, that would be 70ns at 3.3V-3.0V, it's 30ns at 5.0v. IOW, using the voltage heading rather than the clock speed heading for that one.

The clock pulse widths are clock speed determined, clock pulse width high and low are 62ns/63ns at 8MHz. At 5.0v the maximum address setup after the clock rises is 30ns, so there is 32ns until the clock falls if the RAM select and R/W are ORed with the clock, which would be the time for two level combinatorial logic to generate those selects. Then the propagation delay for the RAM chip and Read or Write select from the OR with PHI2 would also be approximately the propagation delay for the deselect on rise of the clock, so there's the roughly 60ns the RAM chip is selected.

But, famously, that is a 8080/Z80 style bus interface, and there are other 8bit bus interfaces that require a different sequence of their bus interface lines.

You raise a valid point, but it isn't quite that straight forward. We're looking for "how long after the falling edge of PHI2 until the address is stable" and "how long before the next falling edge does the data have to be stable". In the datasheet these are tADS (address setup time) and tDSR (read data setup time). The delta between 3.3 and 5 volts is 10 and 5 ns respectively. You could potentially have a tACC of 85ns at 5V, but if CS is ANDed with PHI2, that only gives 5ns extra; not enough to avoid a timing violation with 55ns SRAM.

Link to comment
Share on other sites

1 hour ago, Wavicle said:

The other hint that I'm not sure I heard Ben mention is that for non-trivial designs, do not buy the cheap no-name solderless breadboards off of Amazon.

Yeah, he mentions this on his website for sure.

Link to comment
Share on other sites

1 hour ago, Wavicle said:

You raise a valid point, but it isn't quite that straight forward. We're looking for "how long after the falling edge of PHI2 until the address is stable" and "how long before the next falling edge does the data have to be stable". In the datasheet these are tADS (address setup time) and tDSR (read data setup time). The delta between 3.3 and 5 volts is 10 and 5 ns respectively. You could potentially have a tACC of 85ns at 5V, but if CS is ANDed with PHI2, that only gives 5ns extra; not enough to avoid a timing violation with 55ns SRAM.

Looking at the AS6C4008 timings more closely, it may be looser timing if it is just be OE and WE that are AND'ed with PHI2, with Chip Select generated when it's ready ... then the Read cycle has a maximum of 55ns after CE and 30ns after output enable before valid data is output by RAM. If the CE is generated 10ns before the falling PHI2, there will be valid data from 45ns into the 6502 Read cycle through to the rising RAM OE. That more than meets the minimum 10ns tDSR.

The empirical question is whether the read data hold time is satisfied by the propagation delay of the rising OE and the transition of RAM from output to high impedance, where no minimum is specified for the latter. Maybe an HACT part wouldn't actually be helpful on that side of the timing diagram.

The write cycle doesn't look like an issue ... under WE control, it needs the written data valid for 25ns prior to the rise of WE through to the rise of WE, and unlike the 10ns 6502 read hold, there are no write cycle holds on the ASC6C4008 ... and at 5v and 8MHz, it looks like the 6502 will have valid data being written more than 25ns prior to the rise of WE. So once 55ns is available for the whole WE cycle, things look like they are OK on that side.

Edited by BruceMcF
Link to comment
Share on other sites

14 hours ago, BruceMcF said:

Looking at the AS6C4008 timings more closely, it may be looser timing if it is just be OE and WE that are AND'ed with PHI2, with Chip Select generated when it's ready ... then the Read cycle has a maximum of 55ns after CE and 30ns after output enable before valid data is output by RAM. If the CE is generated 10ns before the falling PHI2, there will be valid data from 45ns into the 6502 Read cycle through to the rising RAM OE. That more than meets the minimum 10ns tDSR.

The empirical question is whether the read data hold time is satisfied by the propagation delay of the rising OE and the transition of RAM from output to high impedance, where no minimum is specified for the latter. Maybe an HACT part wouldn't actually be helpful on that side of the timing diagram.

The write cycle doesn't look like an issue ... under WE control, it needs the written data valid for 25ns prior to the rise of WE through to the rise of WE, and unlike the 10ns 6502 read hold, there are no write cycle holds on the ASC6C4008 ... and at 5v and 8MHz, it looks like the 6502 will have valid data being written more than 25ns prior to the rise of WE. So once 55ns is available for the whole WE cycle, things look like they are OK on that side.

I think playing all the tricks at 8MHz, it comes to around 30ns for combinatorial logic to assert the correct chip select. I guess it is theoretically feasible as long as the address decoding isn't more than 3 gates deep.

I was skimming the AHCT datasheet tonight and another problem occurred to me that I haven't had time to think through in depth: the output drive current of those parts is very weak - 8mA (which is probably why they have lower propagation times, they don't have to switch as much current). That's 1/3 the output current of the ACT parts. I'm not certain that it would be a problem, and I do know for certain that my ability to guesstimate parasitic capacitance sucks, but given the (apparently) long board traces I would be concerned that load capacitance might eat up most of that 30ns of headroom. If the CX16 can run stable at 8MHz, I'm definitely dragging one into work and probing with a fast scope because my intuition is telling me there shouldn't be enough timing margin to do so.

Link to comment
Share on other sites

3 hours ago, Wavicle said:

I think playing all the tricks at 8MHz, it comes to around 30ns for combinatorial logic to assert the correct chip select. I guess it is theoretically feasible as long as the address decoding isn't more than 3 gates deep.

I was skimming the AHCT datasheet tonight and another problem occurred to me that I haven't had time to think through in depth: the output drive current of those parts is very weak - 8mA (which is probably why they have lower propagation times, they don't have to switch as much current). That's 1/3 the output current of the ACT parts. I'm not certain that it would be a problem, and I do know for certain that my ability to guesstimate parasitic capacitance sucks, but given the (apparently) long board traces I would be concerned that load capacitance might eat up most of that 30ns of headroom. If the CX16 can run stable at 8MHz, I'm definitely dragging one into work and probing with a fast scope because my intuition is telling me there shouldn't be enough timing margin to do so.

Hence why I can imagine people playing around with trying a mix of them if they are on the edge. So if you have a demux with high output that can be wired-ored and the result fed through an inverter to get to the low select, you might prefer that inverter to be ACT for drive, and the demux to be AHCT, for speed, where the demux might have only one or two loads.

I don't have any information on HOW they do the select, but it seems to me that if the RAM CS is generated asynchronous to the clock, 6502 bus style, and only the RE/OE synchronized to the bus, there ought to be enough room to fit the CS latencies of the SRAM into both the read and write cycles.

I guess if you put A8-A12 and A15 through the inputs of a dual 3-1 NAND and A13/A14 through two line driver lines (1 level), OR the four outputs (two levels), and feed that into a 3-8 demux with R/W and PHI2, that decoder (3rd level) gives you separate IO-OE, IO-WE, RAM/ROM-OE and RAM/ROM-WE. The OE and WE slew according to the propagation delay of the final demux, on both pull down and pull up, since the PHI2 transition low will be the driver for pulling whichever write/output enable line low, and the same for PHI2 transition high.

With Memory-OE and Memory-WE not generated for the I/O page, you wouldn't need to worry about the I/O page when generating the Memory CS ... just decode %000xxxxx - %100xxxxx as Low RAM, %101xxxxx as High RAM, and %110xxxxx - %111xxxxx as ROM.

 

Link to comment
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

 Share

×
×
  • Create New...

Important Information

Please review our Terms of Use