Jump to content

How to implement slow chips on expansion cards


Lorin Millsap
 Share

Recommended Posts

Just for fun and because it has been asked a few times I want to explain what would be required to use a VIC-II or SID or really any chips that are too slow under normal circumstances. I’m not saying either of these is really a useful idea, but it makes a good introduction to how to use such things on expansion cards. Also up front this article is going to use layman’s terms but is still geared at those interested in hardware.

 

Since it’s the less complex option, I’ll start with how you would use a SID chip. Probably the best point to emphasis because it affects any chip you want to interface to the bus is the access window. The CPU in the x16 uses half cycle access for memory or IO. This means that in each clock cycle the first half of the cycle is spent doing internal processor stuff and setting this up for the actual bus access. During the second half of the cycle the cpu performs either a read or a write access to memory or IO. This time we will refer to as the access window. And all chips on this bus need to be able to respond to read/write operations within that window.

 

So how long is this access window? Well it is measured in nanoseconds. To give you some context, if you have a clock running at 1 MHz, there are exactly 1000 nanoseconds in that one cycle. So if our cpu was running at 1 MHz our access window would be slightly less than half that, about 500ns. If we increase to 2 MHz that access window decreases to about 250ns. However I’m doing a rough approximation because there are some factors that also affect how much of that access window is really usable. There is address deciding aka glue logic that reads which address the cpu is trying to address and then selects or enables the appropriate chips. This process is not instantaneous, and the amount of time it takes for the inputs address lines from the cpu) to the outputs (chip select lines from the logic) is going to be referred to here as propagation delay. This delay is also in nanoseconds and for the sake of simplicity we are going to state this value as 20ns.

 

So with this value we will then see our usable access window at 1mhz is closer to 480ns and at 2mgz would be closer to 230ns. So let’s consider this for the 8MHz that the x16 runs at. We get an nominal window of 62 nanoseconds but an effective window of 42 nanoseconds.

 

So what this means is that any chip that will connect to the bus, be it RAM, ROM, IO chips, video chips, audio chips, etc. must be able to respond to a read or write in less than 42 nanoseconds in order to work reliably. There are ways around this and I will get to them later, but I’m just making the point that any chip that connects directly to the system bus must be able to respond within that access window.

 

In the case of a SID chip I’m not sure what it’s access time is, but it’s likely in the 150-200nS range. So it would work at 1-2 MHz reliably and might work with 4mhz. But it won’t work reliably on an 8mhz bus without some type of buffering.

 

So if you were to implement one, you have several option on how to go about it. If you aren’t concerned with reading the chip, then you could use latches. You would need to latch the data and the address and implement some type of timer to extend the hold time. What you are doing in this case is that instead of interfacing to the SID directly you are instead interfacing to a simple latch which just captures the relevant address lines and the 8 bit data. This buffer then outputs those values for as long as needed.

 

To enable the chip to be read requires some additional circuitry. This method can actually be used for both reads and writes and involves halting the CPU to extend the access window across one or more CPU cycles. Basically when an access occurs the RDY line needs to be pulled low while the BE (Bus Enable) line is pulled high. This causes the CPU to be halted in its current state. Using binary counters we can hold this state for as many cycles as we need. Keeping in mind for an 8 MHz bus if we extend the acces by just one cycle what we actually get is 42ns+125ns for a total of 167ns and if we need more time we get an additional 125ns for each cycle we extend that window by. Keep in mind this method does require the use of bi-directional tranceivers or equivalent.

 

Now to use the VIC-II chip is quite a bit trickier. Basically the main issues is the VIC-II needs access to memory. Since there is no way you could make it play nice with the X16 memory space you’d need to give it its own memory. You would need to design some kind of circuit that would act as a bus bridge. this bridge would have to facilitate both reading and writing to the Vic-IIChip-II chip itself but would also have to act as an indirect memory window. This is doable but is not a lighthearted undertaking. I’m not suggesting anyone actually tackle this I’m just saying that it is possible and that this is what would have to take place.

 

I feel I should also add though that doing these things would not make the X16 capable of running C64 software. i’m merely laying out what would need to take place to make it possible to interface with these chips not what would need to take place to emulate another system. This is just a fun thought experiment and a good opportunity for me to explain how it would be done.

 

One follow up too, this is by no means a definitive guide on the actual timing requirements. The reality is the timing is probably more forgiving that what I’ve stated here, my 20ns example is more like a worst case scenario. We don’t have hard figures on the actual timing yet, for that we need to finalize all details of the board and measure multiple samples at different temperatures and voltage ranges. .

 

 

Sent from my iPhone using Tapatalk

  • Like 5
Link to comment
Share on other sites

Thank you; I've a had a project in mind for quite a while, one where the retro chip I'll be using runs at 4MHz - I'll need to check the specs again to see what kind of window I'm dealing with, but I'm glad there may be a simple solution to at least make it work, even with the performance hit. 

Link to comment
Share on other sites

5 hours ago, Lorin Millsap said:

We don’t have hard figures on the actual timing yet, for that we need to finalize all details of the board and measure multiple samples at different temperatures and voltage ranges. .

While a full characterization at some future point will be great, there is an easy way to help expansion card designers now.

1. Specify the voltage (5V, +/- some tolerance if you want to get fancy)

2. Specify the clock waveform (8MHz, 50/50 duty cycle I guess?)

3. Publish a schematic of the logic between the 65C02 and the expansion slots.  Everything else can be omitted.  There should be no secret sauce here, it's just simple glue logic.

I think that covers the most important stuff.  I'm sure someone will chime in if I forgot something.  Anyone capable of designing an expansion card should also be able to derive the timing requirements from the above information and device datasheets.

Obviously the logic and timing is subject to change.  Anyone designing to a moving target needs to embrace the possibility of breakage in the future.

Link to comment
Share on other sites

While a full characterization at some future point will be great, there is an easy way to help expansion card designers now. 1. Specify the voltage (5V, +/- some tolerance if you want to get fancy)

2. Specify the clock waveform (8MHz, 50/50 duty cycle I guess?)

3. Publish a schematic of the logic between the 65C02 and the expansion slots.  Everything else can be omitted.  There should be no secret sauce here, it's just simple glue logic.

I think that covers the most important stuff.  I'm sure someone will chime in if I forgot something.  Anyone capable of designing an expansion card should also be able to derive the timing requirements from the above information and device datasheets.

Obviously the logic and timing is subject to change.  Anyone designing to a moving target needs to embrace the possibility of breakage in the future.

 

Ok. Sure.

 

1 Voltage is 5v. Tolerance is 10%

2. 8MHz clock with a 50-50 waveform. I do not have specs for the Rise/fall time at the moment.

3. The only glue logic on the expansion bus is the IO address decoder. I’m not sure what the real propagation delay is but it’s safe to assume 20ns.

 

 

Sent from my iPhone using Tapatalk

Link to comment
Share on other sites

4 hours ago, Lorin Millsap said:

it’s safe to assume 20ns.

Is it?  It is hard to judge without seeing the logic.  20ns sounds reasonable-ish for typical delays.  It sounds low for worst-case delays. Consider letting expansion card designers decide how aggressive they want to be with the timing.

I am puzzled why you mention BE in your example.  Most expansion cards can ignore BE (leaving it undriven).  Only cards that need to drive the address bus need to drive BE low to take control.

Link to comment
Share on other sites

Is it?  It is hard to judge without seeing the logic.  20ns sounds reasonable-ish for typical delays.  It sounds low for worst-case delays. Consider letting expansion card designers decide how aggressive they want to be with the timing.
I am puzzled why you mention BE in your example.  Most expansion cards can ignore BE (leaving it undriven).  Only cards that need to drive the address bus need to drive BE low to take control.

I only mention it because a DMA scheme would pull it low and the line is present. For waitstate you don’t want to use it, for DMA you do.

As to letting designers decide how aggressive to be, we don’t know how close you can push it but it’s best to meet the criteria we will spec rather than the absolute limit. Some machines will be more forgiving than others depending on factors like the chip run, temperature, humidity, voltage, etc. the requirements are way tighter on 65xx systems than with comparable intel, Z80, of 68k systems. I’m just trying to say that when people are designing cards they need to take the safety margins into consideration. If you have a chip that needs 50ns to work reliably and it exceeds the 42ns window it would be better to buffer or plan on wait stating than to gamble that it might work most of the time.


Sent from my iPhone using Tapatalk
Link to comment
Share on other sites

23 minutes ago, Lorin Millsap said:

As to letting designers decide how aggressive to be, we don’t know how close you can push it but it’s best to meet the criteria we will spec rather than the absolute limit.

I agree, assuming you correctly specify the worst case timing.  Like I wrote before, I am dubious “assume 20ns” meets that criteria.

Sorry, I can’t help being picky about timing, what with it being my livelihood and all.

Link to comment
Share on other sites

I agree, assuming you correctly specify the worst case timing.  Like I wrote before, I am dubious “assume 20ns” meets that criteria.
Sorry, I can’t help being picky about timing, what with it being my livelihood and all.

All I can say is we don’t have a solid figure on that yet. When we know that for certain we will publish it in an official spec sheet with timing charts.


Sent from my iPhone using Tapatalk
Link to comment
Share on other sites

11 hours ago, Lorin Millsap said:

Follow up on the SID chip access time. I can’t hold this as definitive, but one datasheet lists the access time as 350ns which means even 2mhz may not be reliable.


Sent from my iPhone using Tapatalk

2MHz should be reliable enough. C128s were sold with both models of SID chip, and those were definitely used at 2MHz. 

It sounds to me like a first task for third party developers would be to build an 8:1 clock divider, along with some sort of buffer to read from the CPU bus and stretch that cycle on the expansion card side. If we're only talking about writing data (to a SID or VIC), then that should actually work fairly well. I'm just not sure what kind of hardware that would require - I expect a CLPD would be the device of choice, with address and data latches. 

I think I even have a process to make it happen, but I know nothing about programming CLPDs. 

 

Link to comment
Share on other sites

2MHz should be reliable enough. C128s were sold with both models of SID chip, and those were definitely used at 2MHz. 
It sounds to me like a first task for third party developers would be to build an 8:1 clock divider, along with some sort of buffer to read from the CPU bus and stretch that cycle on the expansion card side. If we're only talking about writing data (to a SID or VIC), then that should actually work fairly well. I'm just not sure what kind of hardware that would require - I expect a CLPD would be the device of choice, with address and data latches. 

I think I even have a process to make it happen, but I know nothing about programming CLPDs. 
 

Well if you are using logic chips, all you need to divide the clock is a binary counter or a few flip-flops. For your buffer all you need is a few latches. Then a few minor supporting gates and flip-flops and you could do the simple interfaces without any programmable logic at all. Consider all of our glue logic on the main board is done with common logic chips and no programmable logic. The bus clock is present on the expansion connector.


Sent from my iPhone using Tapatalk
Link to comment
Share on other sites

  • 5 months later...

I am not sure how to proceed if I want to try do something to work with the IO bus with chips or other electronics that have a different format. Could you give me some ideas about what to search for or look after in order to do so? Using a much faster CPU (on the other end) will not manage to deal with the signalling so that a program can manage the expansion slot in real-time?

Link to comment
Share on other sites

I am not sure how to proceed if I want to try do something to work with the IO bus with chips or other electronics that have a different format. Could you give me some ideas about what to search for or look after in order to do so? Using a much faster CPU (on the other end) will not manage to deal with the signalling so that a program can manage the expansion slot in real-time?

Your question is somewhat vague. So I’m not actually completely sure what you are actually asking. But I’ll try to provide an answer.

Chips do come in a variety of formats if you will divided into 2 basic types. Parallel and serial. For the purposes of the expansion bus we are discussing here, it is only directly compatible with parallel type interfaces.

Within the parallel type chips there is typically a data bus and sometimes might be an address or register select as we as a Chip Select or Enable and depending on the chip there may be an Output Enable and Write Enable or a multiplexed R/W line. For chips that require separate OE and WE lines you will need to provide the circuitry to split the CPU R/W line into the required OE and WE signals.

As to asking if a much faster CPU can be used I have in other articles tried to explain in layman’s terms why this is an issue. It can be mitigated once you understand what the issue is but it’s not as simple as just connecting the buses together.

To grasp this you need to think in terms of timing and for my example I’ll use an 8 MHz 6502 and we’ll have a 48 MHz AVR chip. And for simplicity sake we will assume only a single register which can be read and written.

How would you interface it? Well for starters you would need to assign 8 of the IO pins as the data bus. Next the CS line coming from the X16 needs to be set up as an external interrupt on the AVR. Since we have only a single register you will not need any address lines. And we can use the X16 R/W line to tell the AVR whether we are reading or writing. So you need 10 total IO pins on the AVR.

The X16 can now either read or write to the single register we are presenting. So to understand the timing that will be taking place and we have to allow a margin at the edges because the clocks are not synchronous. You need a margin in the design anyway but more if you cannot guarantee synchronization.

So what happens in timing terms when you attempt to write the register. Well we are going to do some quick math to determine what our access windows are. For a 6502 at 8 MHz you take 1000 ns (nanoseconds) and dude it by 8 to get the size of our cycles in nanoseconds. In this case it’s 125 ns. However the 6502 performs accesses in half cycles rather than full cycles so we need to divide that figure in half. Which gives us 62.5 nanoseconds for the access. Next we need to solve for the AVR at 48 MHz. Same equation 1000ns/48mhz gives us about 20 nanoseconds per cycle.

So let’s go step by step with what will happen starting with the start of the access window (when the CS line is triggered). This will invoke an interrupt on the AVR. The AVR will need to complete its current instruction so more or less your first 20ns of your 62ns window is taken up. Next the AVR is going to check to see if this is a read or a write access and branch accordingly. This takes a full cycle. So now you are into 40ns into the access. Next the AVR needs to store the value on the data bus which is going to use a single cycle. You are now at 60ns into the cycle and the X16 has successfully written to the register. But there is a caveat. This is assuming the clock edges are closely aligned. In reality they could be up to 20ns off in which case you could add nearly 20ns before the CS/IRQ is registered. This would throw all the actions that follow by the same amount. Since the data on the bus is only guaranteed to be valid for 10ns after the end of the cycle you will in this example possibly work, but it wouldn’t be guaranteed 100% of the time. So this is clear during this action the AVR only has time to execute 2 or three instructions before the access window ends and the data that is written is no longer guaranteed to be valid.

So what if the x16 needs to read. Well in terms of steps the AVR needs to get into it IRQ, branch based on the R/W state, and provide the requested data in less than 62ns. But it gets tougher here. The data needs to be valid and stable for at least 10ns before the end of the access window. In this scenario it is just possible if the clocks are correctly lined up but otherwise is a timing violation and will not work correctly.

So how do we mitigate this?

You need either a faster AVR and by that I mean much faster. Probably in the 60-200 MHz range. What about PIC, ARM, or Propeller? Well the same things apply to each of those. Each architecture will have its own limitations and some may be better suited but it’s still going to be a matter of how many steps can be completed during that access window.

Another way of mitigating it is by introducing waitstates. This would be added circuitry though in some cases it could be implemented by the AVR of equivalent. Basically the the CS line is asserted the x16s RDY line needs to be pulled low and held low as long as required. The issue with this is the X16 wasn’t exactly designed with waitstate in mind. In theory it would probably work fine but as of this writing it hasn’t been tested. If this waitstate was implemented in logic it could be done with a counter and a flip flop. If it was done by the AVR you could use one of its IO lines connected to RDY to perform the same task. You would just pull that line low first thing in the AVR and then switch the line back to high when the code is completed. This would have the benefit of allow as much access time as needed. But it has the effective drawback of slowing down the X16 cpu though in most cases it would probably only be a single cycle of delay.

As an important not I am making some assumptions about the AVR and that other micros will be similar. I am assuming that the IRQ only takes a single cycle and that you can execute a branch instruction based on the value of an IO pin and that all these instructions only take 1 cycle to complete. If any of these take longer then you definitely have to invoke waitstates.

By contrast most of these same tasks can easily be done by a sufficiently fast CPLD or FPGA which is why they a better suited to use on the system bus than a microcontroller. Microcontrollers are better off interfacing to the user port where the timing is much more forgiving.


Sent from my iPhone using Tapatalk
  • Like 2
Link to comment
Share on other sites

On 1/2/2021 at 2:39 AM, Lorin Millsap said:

Your question is somewhat vague. So I’m not actually completely sure what you are actually asking. But I’ll try to provide an answer.

Thanks for your reply. I had to spend a bit time on research before replying back to you.

I agree that the scope I presented was a bit vague. What I want is to enable communication with a faster device using the I/O area. This faster device is a SoC of some kind that can provide stuff like wireless connectivity etc. All IO access is initiated from the X16 side. So in this setting the X16 would be the slow device. The decoding of the address bus (from 16 bits down to 5 bits + CS) will be handled in hardware.

When I was thinking about trying to implement this is software I wanted to use a much faster system in the GHz range, like a Raspberry Pi Compute Module 4. I would need 5 pins for addressing, 8 pins for data and then a few for system (CS, clock, RWB). Using a 1,5GHz CPU would give a clock cycle on 0,67ns, compared to 125ns. I tought it was possible to manage this in code.

I have spent some time reading and trying to understand CPLD and FPGA. CPLD seems (in theory) fairly easy to implement, and I have some ideas to work on. FPGA seems somewhat harder to implement but gives me more options that I am not sure I need. For both of these solutions the main issue is how to develop and "compile" the code. The toolchain needed often seems to cost lots of money with a few exceptions. Documentation and examples on now to develop is hard to come by.

I could not find what model of FPGA is used for Vera. If I want to look further into this it would be nice to at least try work with the same vendor that is used by Vera.

Should we think about a "reference design" on how to connect an expansion cartridge using CPLD og FPGA? I think some would come from how Vera is designed / programmed and can't wait for the schematics and code to be published.

Link to comment
Share on other sites

Just because a Raspberry Pi is in the ghz range doesn’t mean it’s IO is. I couldn’t find any verifiable info on that but it looks like a Pi3 can theoretically do around 66Mhz. So it might be possible depending on whether it can read entire ports in a single cycle or if it has to read or set one pin at a time.

 

As to CPLD vs FPGA there isn’t a huge difference between them and as to which FPGA the VERA uses that’s not a big deal either. Most FPGAs are gonna be chosen based on your actual needs ie how many macrocells you require.

 

 

Sent from my iPhone using Tapatalk

Link to comment
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

 Share

×
×
  • Create New...

Important Information

Please review our Terms of Use