Jump to content

What is VERA - physically?


TomXP411
 Share

Recommended Posts

I know this has been answered somewhere else, but the information has been buried beneath a mountain of other comments. 

@Frank van den Hoef

Can  you talk about the FPGA that's actually being used for VERA? 

All this talk about FPGAs and what can and can't be done has got me itching to know more - and possibly to do a deep dive into the technology, myself. 

 

 

  • Like 2
Link to comment
Share on other sites

On 10/12/2021 at 7:06 PM, TomXP411 said:

FPGA that's actually being used for VERA

Lattice ICE40UP5K in the 48-pin QFN. https://www.latticesemi.com/en/Products/FPGAandCPLD/iCE40UltraPlus

The datasheet: https://www.latticesemi.com/view_document?document_id=51968

Lattice's iCE40 family is a good place to start exploring FPGAs.  They are a good deal simpler than most offerings from Xilinx and Intel.

  • Like 4
  • Thanks 1
Link to comment
Share on other sites

If I would treat VERA as a black box, how is it accessed from outside (the 6502 CPU)? Does it have a couple of address input and 8 bit data in/output pins that are connected just like RAM or RAM to the bus of the CPU?

How this internally works? Do some FPGA already have some processor built-in or are there standard libraries available for the FPGA-programming software that "wire" certain virtual processors inside the FPGA?

Do I understand it from the datasheet correctly, that level shifters are needed to communicate with a 5V bus because it operates at 1.2V?

Edited by Ju+Te
Link to comment
Share on other sites

On 10/13/2021 at 12:43 PM, Ju+Te said:

If I would treat VERA as a black box, how is it accessed from outside (the 6502 CPU)? Does it have a couple of address input and 8 bit data in/output pins that are connected just like RAM or RAM to the bus of the CPU?

How this internally works? Do some FPGA already have some processor built-in or are there standard libraries available for the FPGA-programming software that "wire" certain virtual processors inside the FPGA?

Do I understand it from the datasheet correctly, that level shifters are needed to communicate with a 5V bus because it operates at 1.2V?

There are different FPGA with differing capabilities. Not all are such low power, though mostly I think yes, you would need level shifters to interact with a 5V bus.

For communication between a physical CPU and the FPGA (or really, anything interacting with the FPGA), your HDL defines a number of externally exposed IO lines to serve whatever purpose you want. For example, I have a Nexys 4 DDR board that exposes 40 pins to the outside world (and more IO is assigned to other IO devices on the board itself, such as switches, 7 segment displays, LEDs, network, VGA, etc, etc, etc).

Some FPGA have a CPU sitting next to the FPGA, or IP is available to embed a soft core CPU into the fabric of the FPGA. Others just provide the FPGA and a processor (if desired) has to be created from scratch or sourced from another project or offering.

  • Thanks 1
Link to comment
Share on other sites

On 10/13/2021 at 2:43 PM, Ju+Te said:

If I would treat VERA as a black box, how is it accessed from outside (the 6502 CPU)? Does it have a couple of address input and 8 bit data in/output pins that are connected just like RAM or RAM to the bus of the CPU?

Pretty much.  I don't remember seeing the interface details published, but it should be similar to a 65C22 VIA.

On 10/13/2021 at 2:43 PM, Ju+Te said:

How this internally works?

It's a bespoke digital design.  The logic building blocks inside the ICE40UP5K are pretty simple, mostly D flip-flops in various flavors and 4-input look-up table cells which can do any arbitrary 4-bit logic function.  The full cell library is specified here: https://www.latticesemi.com/view_document?document_id=52206

On 10/13/2021 at 2:43 PM, Ju+Te said:

Do I understand it from the datasheet correctly, that level shifters are needed to communicate with a 5V bus because it operates at 1.2V?

Almost.  1.2V is the core voltage.  There is a second power rail for the IO, 3.3V for Vera.  The FPGA has internal level shifters between the core and IO rails.  External level shifters are needed to interface the 3.3V Vera IO to the 5V X16 logic.

  • Thanks 1
Link to comment
Share on other sites

On 10/13/2021 at 2:43 PM, Ju+Te said:

If I would treat VERA as a black box, how is it accessed from outside (the 6502 CPU)? Does it have a couple of address input and 8 bit data in/output pins that are connected just like RAM or RAM to the bus of the CPU? ...

From what they've said before, exactly ... 8 I/O pins connect to the data bus, 5 I/O pins connect to A0-A4, and from that I would guess three more pins allocated for chip select, R/W and PHI2, with all the lines from the 6502 level shifted to 3.3V.

  • Like 2
  • Thanks 1
Link to comment
Share on other sites

On 10/13/2021 at 5:16 PM, BruceMcF said:

From what they've said before, exactly ... 8 I/O pins connect to the data bus, 5 I/O pins connect to A0-A4, and from that I would guess three more pins allocated for chip select, R/W and PHI2, with all the lines from the 6502 level shifted to 3.3V.

I think the same thing. More than that would mean a lesson in 65x logic design, which could be a thread all of its own. 😃

 

  • Like 1
Link to comment
Share on other sites

On 10/14/2021 at 12:53 AM, TomXP411 said:

I think the same thing. More than that would mean a lesson in 65x logic design, which could be a thread all of its own. 😃

One interesting point is whether it's timing as designed will play nice with a 16MHz Z80 bus, since a 32byte register address range fits well with the 256byte I/O address space of the Z80. Use a 2->4 decoder to select Vera on I/O (a6,a7)=%00, select tri-state hex latches on %01 and %10, tie the Vera SPI select to the output enable of the %01 latch to select from four SPI devices, tie the other latch to select a 32K memory bank on a 512KB SRAM, and it could make a really cute little CP/M Plus system to play with ... two SD cards, one UART and an I2C bus master for parallel port, keyboard, etc.

  • Like 1
Link to comment
Share on other sites

On 10/14/2021 at 8:16 AM, BruceMcF said:

One interesting point is whether it's timing as designed will play nice with a 16MHz Z80 bus, since a 32byte register address range fits well with the 256byte I/O address space of the Z80. Use a 2->4 decoder to select Vera on I/O (a6,a7)=%00, select tri-state hex latches on %01 and %10, tie the Vera SPI select to the output enable of the %01 latch to select from four SPI devices, tie the other latch to select a 32K memory bank on a 512KB SRAM, and it could make a really cute little CP/M Plus system to play with ... two SD cards, one UART and an I2C bus master for parallel port, keyboard, etc.

If I recall, the system clock is also driven by VERA. In that case, there would be firmware changes needed for VERA to live on an 8080 style system. Either VERA would need to generate the clock, or you'd have to drive the I/O side independently from the GPU side of the core. In either event, you might as well modify the bus sequencing to account for 8080 style I/O while you're at it. 

 

 

Link to comment
Share on other sites

On 10/14/2021 at 3:33 PM, TomXP411 said:

If I recall, the system clock is also driven by VERA. ...

I had once been under that impression, but AFAIU, no, the system clock is on the board, and the bus interface is not synchronized with the internal Vera clock.

I believe that driving the system clock from a countdown from the internal FPGA clock may have been one idea for handling the problems with interference between bus interface they added to the Gameduino and the actions of the J1 coprocessor that the Gameduino included, but that since Vera doesn't have an embedded processor core, it doesn't have the same problem.

Edited by BruceMcF
Link to comment
Share on other sites

On 10/14/2021 at 1:33 PM, TomXP411 said:

If I recall, the system clock is also driven by VERA. In that case, there would be firmware changes needed for VERA to live on an 8080 style system. Either VERA would need to generate the clock, or you'd have to drive the I/O side independently from the GPU side of the core. In either event, you might as well modify the bus sequencing to account for 8080 style I/O while you're at it. 

 

 

This is incorrect. The VERA is designed so that it’s buss interface is independent of the host clock because we want it to work on other systems. It does not generate the system clock. 

  • Thanks 1
Link to comment
Share on other sites

On 10/19/2021 at 1:41 PM, Lorin Millsap said:

This is incorrect. The VERA is designed so that it’s buss interface is independent of the host clock because we want it to work on other systems. It does not generate the system clock. 

Thanks. I know I remembered some  conversation about that, but it might have been one of those passing "this is what we're thinking of doing" things. 

 

Link to comment
Share on other sites

On 10/19/2021 at 5:31 PM, TomXP411 said:

Thanks. I know I remembered some  conversation about that, but it might have been one of those passing "this is what we're thinking of doing" things.

As I mentioned, it may have been a proposal ... and quite possibly not from the design team ... to deal with a Gameduino problem that simply doesn't exist with Vera, as Vera has no embedded co-processor core. That would have been almost near the end of the "Gameduino video display era".

That discussion would have been long before this site was built, so it would be lost in the morass that is the FB discussion system.

h/t @Lorin Millsap for confirming that the Vera "register" access is not synchronous with the Vera VGA dot clock.

Edited by BruceMcF
Link to comment
Share on other sites

  • 2 weeks later...
On 10/19/2021 at 3:12 PM, BruceMcF said:

h/t @Lorin Millsap for confirming that the Vera "register" access is not synchronous with the Vera VGA dot clock.

I think what was said is that VERA register access is async to the host system clock. It likely is synchronous to either the PLL-derived dot clock or the system clock. My guess would be it has a 48MHz clock, like the x8, giving it a 21ns cycle time which would make VERA accesses look very similar to the 40ns SRAM.

 

On 10/13/2021 at 5:16 PM, BruceMcF said:

From what they've said before, exactly ... 8 I/O pins connect to the data bus, 5 I/O pins connect to A0-A4, and from that I would guess three more pins allocated for chip select, R/W and PHI2, with all the lines from the 6502 level shifted to 3.3V.

PHI2 isn't needed, so the "obvious" 8th pin would be IRQB, however unlike the data bus which is bidirectional or A0-A4, CS, and RW which are input only (to VERA), IRQB is output only. The 65C02 datasheet says Vih must be Vcc * 0.7 (3.5V) which is close to 3.3V, but 3.5V isn't an approximate value, it is an absolute minimum so while driving IRQB directly from VERA at 3.3V will work most of the time, it isn't guaranteed to trigger reliably. (Oh man was that embarrassing, IRQB is active low, it says it right in the signal name. It shouldn't be a problem driving it from VERA as long as there wasn't an issue with the pin normally pulled up to 5V.)

Edited by Wavicle
Link to comment
Share on other sites

On 10/27/2021 at 10:07 PM, Wavicle said:

I think what was said is that VERA register access is async to the host system clock. It likely is synchronous to either the PLL-derived dot clock or the system clock. My guess would be it has a 48MHz clock, like the x8, giving it a 21ns cycle time which would make VERA accesses look very similar to the 40ns SRAM.

 

PHI2 isn't needed, so the "obvious" 8th pin would be IRQB, however unlike the data bus which is bidirectional or A0-A4, CS, and RW which are input only (to VERA), IRQB is output only. The 65C02 datasheet says Vih must be Vcc * 0.7 (3.5V) which is close to 3.3V, but 3.5V isn't an approximate value, it is an absolute minimum so while driving IRQB directly from VERA at 3.3V will work most of the time, it isn't guaranteed to trigger reliably. (Oh man was that embarrassing, IRQB is active low, it says it right in the signal name. It shouldn't be a problem driving it from VERA as long as there wasn't an issue with the pin normally pulled up to 5V.)

VERA has a 50Mhz clock.

Link to comment
Share on other sites

On 10/27/2021 at 9:07 PM, Wavicle said:

PHI2 isn't needed, so the "obvious" 8th pin would be IRQB

Looking at screen caps again, I'm going to amend my guess and say that I think VERA probably has two CS# pins. There's a CD74ACT138E that I assume is being used to generate the IO CS# signals and I don't see anything to combine the $9F2x and $9F3x range signals.

I can also see VERA has two SN74LVC4245A bus transceivers, which makes me think maybe VERA IO reads are fully asynchronous. Synchronizing to the 50MHz clock and having the ability to respond on the same clock cycle the bus is sampled still means a worst case of say 39ns (if CS# was asserted 1ns after the clock edge VERA samples on) and adding to that the delays crossing two bus transceivers (2 * 6ns), the mux (10ns), and the read hold time (10ns), the result exceeds the 62ns pulse width. This would probably hit SD card access the hardest since graphics operations tend to be write heavy.

Link to comment
Share on other sites

On 10/29/2021 at 6:51 PM, Wavicle said:

Looking at screen caps again, I'm going to amend my guess and say that I think VERA probably has two CS# pins. There's a CD74ACT138E that I assume is being used to generate the IO CS# signals and I don't see anything to combine the $9F2x and $9F3x range signals.

I can also see VERA has two SN74LVC4245A bus transceivers, which makes me think maybe VERA IO reads are fully asynchronous. Synchronizing to the 50MHz clock and having the ability to respond on the same clock cycle the bus is sampled still means a worst case of say 39ns (if CS# was asserted 1ns after the clock edge VERA samples on) and adding to that the delays crossing two bus transceivers (2 * 6ns), the mux (10ns), and the read hold time (10ns), the result exceeds the 62ns pulse width. This would probably hit SD card access the hardest since graphics operations tend to be write heavy.

Since the IO memory mapped "slot" design is eight slots of up to 32 addresses each, the CS is likely for the full $9F20-$9F3F range and A0-A4 as register addresses.

It seems like one 74x424 can be used for the eight data lines, with Data Direction tied to R/W and output enable tied to CS-IO1, and the other one tied 5v->3.3v only for the address lines and passing through the CS-IO1, A0-A4, R/W and possibly one more to Vera.

If the bus side interface is asynchronous to the Vera internal clock, then using PHI2 low when CS-IO1 is low would be a reliable state indicating A0-A4 lines valid, while doing a count from CS-IO1 dropping low might be less robust, since the delay from rising edge of PHI2 and A0-A4 being valid not be synchronous with the 50MHz clock.

So if there are enough 50MHz cycles within a single 8MHz /PHI2 phase to allow use of CS-101=PHI2=0 as a valid address line state, adding PHI2 as the eighth bus control line would be simplest. If not, then worst case from the earliest possible detection of the drop of CS-IO1 to the latest possible valid A0-A4 lines could be used with a counter of 50MHz cycles.

Edited by BruceMcF
Link to comment
Share on other sites

On 11/2/2021 at 8:31 AM, BruceMcF said:

Since the IO memory mapped "slot" design is eight slots of up to 32 addresses each, the CS is likely for the full $9F20-$9F3F range and A0-A4 as register addresses.

You are correct - I'm not sure why I thought each IO had 16 addresses. It might be because the VIAs share a single 32 address range.

On 11/2/2021 at 8:31 AM, BruceMcF said:

If the bus side interface is asynchronous to the Vera internal clock, then using PHI2 low when CS-IO1 is low would be a reliable state indicating A0-A4 lines valid, while doing a count from CS-IO1 dropping low might be less robust, since the delay from rising edge of PHI2 and A0-A4 being valid not be synchronous with the 50MHz clock.

So if there are enough 50MHz cycles within a single 8MHz /PHI2 phase to allow use of CS-101=PHI2=0 as a valid address line state, adding PHI2 as the eighth bus control line would be simplest. If not, then worst case from the earliest possible detection of the drop of CS-IO1 to the latest possible valid A0-A4 lines could be used with a counter of 50MHz cycles.

I believe that a problem here is that the state of the address bus is indeterminate from the end of tAH until the end of tADS. It's possible, and legal, for them to briefly have $9F2x/$943x during that time which would cause the address decoding logic to briefly toggle VERA's CS. For the SRAM and flash NAND components, a glitched CS isn't a big deal because they are completely asynchronous components. However for VERA, I don't think that's the case. One of negedge CS or posedge PHI2 is going to strobe the design to latch the address bus (and possibly the data bus) and respond to it. For this reason, I would suspect that CS and PHI2 must be ANDed together for VERA. This would match with what Adrian said in his X16 video (VIAs will not work if CS is ANDed with PHI2 because CS will not have propagated to the internal edge-sensitive flipflop in the VIA when the edge of PHI2 arrives and the access will get ignored)..

If so, VERA (or any similar IO device that cannot handle CS de-assertions mid-operation) would have from posedge PHI2 until start of tDSR to put valid data on the data bus. With a 62ns high pulse width, a 10ns tDSR, and 6ns x 2 crossing the bus transceivers, that's exactly 40ns or 2 cycles of a 50MHz clock left over. Because we can't be certain of the alignment between the VERA clock and the system clock, we can assume that if VERA's bus interface were synchronized with its system clock, it would have to be able to respond in a single 50MHz clock in order to stay within timing constraints. If it took 2 clocks (1 to latch the input and 1 to update the data bus), then it is possible that VERAs clock edge would hit just before CS is strobed and VERA would have to wait a minimum of half a clock (10ns) to check again (if not one full clock) meaning 2.5-3 clocks (50-60ns) to respond. Hence the reason I suspect the bus-facing logic may be completely asynchronous to the system clock or have single cycle latency.

I think this only hits read operations. Writes can be posted. This timing sensitivity would impact SD accesses (which are read intensive) far more than most graphics operations (which are write intensive).

Edited by Wavicle
Link to comment
Share on other sites

On 11/3/2021 at 5:48 AM, Wavicle said:

I believe that a problem here is that the state of the address bus is indeterminate from the end of tAH until the end of tADS. It's possible, and legal, for them to briefly have $9F2x/$943x during that time which would cause the address decoding logic to briefly toggle VERA's CS. ...

Yes, although if it is on CS alone, it in any event requires a countdown from /CS to A0-A4 assured valid, and so that countdown requiring /CS at each tick would cover that.

However, it is simpler to .AND. CS & PHI2, since address lines are valid throughout /PHI2. It might be faster if CS and PHI2 are driven through the transceiver and then the AND takes place inside Vera, but it is more parsimonious of pins to AND CS and PHI2 externally and drive that /VeraSEL through the transceiver.  And a single /VeraSEL is more flexible in interfacing to a variety of buses.

SD should not be an issue reading in a single clock, since the access to the SPI data port can be organized to be entirely parallel to the Vera pipeline generating the next display row using SPRAM data. The possible contention is when there is a read of port A or port B. If the pipeline is organized to access the SPRAM in alternate clock cycles, then the non-contenting approach is to use the other clock cycle for system bus to SPRAM operations, which may require two Vera internal clocks. If the pipeline is organized to access the SPRAM in every clock cycle and is simply paused when there is a system bus to SPRAM operation, then perhaps the internal read can be accomplished in a single internal Vera cycle.

 

Link to comment
Share on other sites

On 11/3/2021 at 8:16 AM, BruceMcF said:

Yes, although if it is on CS alone, it in any event requires a countdown from /CS to A0-A4 assured valid, and so that countdown requiring /CS at each tick would cover that.

However, it is simpler to .AND. CS & PHI2, since address lines are valid throughout /PHI2. It might be faster if CS and PHI2 are driven through the transceiver and then the AND takes place inside Vera, but it is more parsimonious of pins to AND CS and PHI2 externally and drive that /VeraSEL through the transceiver.  And a single /VeraSEL is more flexible in interfacing to a variety of buses.

The 3-to-8 demux (CD74ACT138E) generating the IO CS signals internally ANDs 3 pins (G1, /G2A, /G2B):
image.png.0d6a96c4bf58c9b8a007b3ce90c1db3f.png

The AND operation of CS and PHI2 could happen implicitly if G1 was connected to PHI2. That said, this would make the timing issue worse since that would mean CS would arrive 12ns after PHI2. I think that to avoid this the best option would be as you suggest - allow CS glitches through and also pass PHI2 through the bus transceiver to the fpga. VERA would then sample A0-A4, CS, RW#, and PHI2 on posedge PHI2.

On 11/3/2021 at 8:16 AM, BruceMcF said:

SD should not be an issue reading in a single clock, since the access to the SPI data port can be organized to be entirely parallel to the Vera pipeline generating the next display row using SPRAM data. The possible contention is when there is a read of port A or port B. If the pipeline is organized to access the SPRAM in alternate clock cycles, then the non-contenting approach is to use the other clock cycle for system bus to SPRAM operations, which may require two Vera internal clocks. If the pipeline is organized to access the SPRAM in every clock cycle and is simply paused when there is a system bus to SPRAM operation, then perhaps the internal read can be accomplished in a single internal Vera cycle.

 

Accessing SPRAM during the same PHI2 clock as the host read is probably a non-starter. The contention with the scanline composer means that you need one cycle to determine whose address goes into SPRAM and the SPRAM IP will respond one cycle after that. Since all VERA addresses are functionally registers, I don't think any operation needs to do this. VRAM reads require the address to be pre-configured, so there is time to fetch the memory and have it waiting in DATA0/DATA1.

  • Like 1
Link to comment
Share on other sites

On 11/3/2021 at 12:57 PM, Wavicle said:

.. VRAM reads require the address to be pre-configured, so there is time to fetch the memory and have it waiting in DATA0/DATA1.

Yes, the functional equivalent to a write to a latch that is then written at a stage of the rowbuffer generation pipeline that is not reading SPRAM would be a "ready to be read" latched pre-fetch for Port A or B within a given number of Vera cycles of any previous access to the same port (and at reset). That would lead to a lot of unused reads, but it would work, and it would be straightforward to program into the pipeline cycle ... when the "bus access" phase comes up, if a buffered write is waiting, write, if no buffered write is waiting and a pre-fetch read latch has expired, read.

 

Link to comment
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

 Share

×
×
  • Create New...

Important Information

Please review our Terms of Use