Jump to content


  • Content Count

  • Joined

  • Last visited

Posts posted by Guybrush

  1. 26 minutes ago, AndyMt said:

    Yes, that's what I'm going for. And I'll have to change the entire palette over multiple scanlines. This makes the line IRQ not all too useful - I wonder how much you could do on the C64...

    Well, for one, you could not change the palette at all 😀.

    Of course, you could change the background color registers (1,3 or 4, depending on the display mode), the sprite multi-color registers (2) and sprite color registers (8), but you'd have to time your code very precisely. And then there are badlines 😀.

    On a normal line with no sprites, you get 63 cycles for PAL, and 64 or 65 for NTSC. That's 23-25 cycles off-screen, even a bit more since you only have to perform your writes in the invisible part of the scanline. If you're changing global color registers that's just about enough time to update 3-4 of them if you use self-modifying code (to directly update color values in LDA #val instructions). On a line with sprites, you lose up to 16 cycles in the invisible part, but if you're changing sprite colors you are more flexible in your timing because you know when the sprite is being displayed and since the sprite color affects only one one sprite... you get the point.

    Long story short, on the C64 you also need to design the screens carefully to make big changes to colors.

    • Like 1

  2. 33 minutes ago, AndyMt said:

    I experimented with the raster line IRQ the last few days, too. I got it working, but there is very few cycles left you can use. Updating a (256) color palette seems almost impossible. I'll try to use the auto-increment feature of VERA, to improve this, by storing the 2nd palette in VRAM, too.

    And that's with the emulator- no idea if this will work on real hardware.

    It's simply impossible to update the entire palette during one scanline, even if you use every available hardware trick. There's only 254 (266) CPU cycles per scanline, the exact number depends on whether the CPU runs on dedicated 8 MHz clock, or a VGA clock divided by 3 (25.175 / 3 = 8.3196666 MHz) .

    Since a simple LDA/STA combination to read and write from/to VERA data registers takes 8 cycles, even if you completely unroll the loop, you can copy 254 / 8 / 2 (2 bytes per palette entry) ~= 15 palette entries. And I'm not even taking the KERNAL IRQ handler overhead into account.

    Even if there was a DMA chip somewhere in the system, it couldn't update the palette in one scanline beacuse it would need 256*2*2 = 1024 cycles, which also means that even a DMA internal to VERA couldn't do it because a VGA 640x480 scanline is only 800 cycles long.

    The only way to update the entire palette in one scanline would be to have a relocatable palette.

    So, you can only change a few colors each scanline and must design your graphics accordingly.

    • Like 1

  3. Be aware that VERA hardware will probably have very different timings. The emulator renders the current line into the buffer in one cycle (from the CPU's perspective), while real hardware will be rendering most of the time during the scanline. If I'm understanding things correctly, real VERA will actually be rendering the next scanline into 3 separate line buffers (sprites, layer 0, layer 1) while it composes and outputs the current scanline from its line buffers. Which means that VERA is actually using a sort of scanline double buffering internally. See this comment from VERA's designer, Frank van den Hoef.

    I suppose VERA hardware could latch the contents of its registers (layer settings, sprite settings etc.), but it's not going to be latching the VRAM. Therefore, writes to VRAM might affect the rendering of the scanline on real hardware. And if VERA doesn't perform any latching of its registers, any changes of those registers may also affect the rendering of the current (actually, next) scanline.

    So, it really remains to be seen how the real hardware handles the rendering and what are the exact timings, e.g. when does the rendering of the scanline start/end and which, if any, registers are being latched. At this time it is only safe to assume that any changes that might affect the rendering of the scanline can only be made during a small window, but since we don't know where and how long this window is, we're back to square one 😀

    TL;DR the emulator is not at all accurate when it comes to line interrupts. Any effects you create and test with the emulator may or may not work on real hardware. YMMV.

  4. @rje Please be aware that 256x256 tile map is really not usable since it uses 128k of VRAM (256 * 256 * 2bytes). I suggest you use the tile map size only slightly larger than your visible game area, and load in new columns or rows as needed. And just try to wrap (pun intended) your mind around the fact that tile layers wrap around 😀

    • Thanks 1

  5. Character modes are simply two kinds of tile mode where there's 256 different tiles (characters) available.

    The other tile modes make 1024 different tiles available, in sizes of 8x8, 8x16, 16x8 or 16x16 pixels. Also in 1, 2, 4, or 8 bits per pixel, and in map sizes of 32, 64, 128 or 256 tiles vertically or horizontally, in any combination.

    So, there's really 4 tile sizes * 4 bit depths * 4 map sizes horizontally * 4 map sizes vertically = 256 different combinations per tile layer.

    Oh, and tiles in these "real" tile modes can be horizontally and vertically mirrored.

    I suggest you read the VERA Programmer's Reference thoroughly and try to understand the tile modes, because they're VERA's main strength.


  6. 17 minutes ago, The 8-Bit Guy said:

    Excellent idea.  I hadn't considered this.  It would still require changing the data port before and after the IRQ runs, but that's probably less code and CPU cycles than backing up the registers.  

    It would only require changing the CTRL register's ADDRSEL bit to set the active port while reloading the address registers but that's just one register instead of three.

    If you don't use any of the KERNAL's graphics functions then you have complete control of the VERA and you can be sure that nothing else is touching the address registers and the CTRL register. Then you can simply do LDA #$01, ORA $9f29 when you enter the interrupt handler and LDA #$FE, AND $9f29 before exiting. That's just 12 cycles  instead of 15 when using the stack.😛 I know that's not that important on Commander X16 compared to C64, but I just can't help myself 😆.

  7. 1 minute ago, novemix said:

    Still seems odd, when half of zero page, and 2 more pages ($200-$3ff) are reserved.  (mostly, why does it need $200-$3ff, when there's 8k in bank 0?)

    Because that area stores variables that are probably often used, and if they were in bank 0, then most if not all KERNAL and BASIC functions would need to switch banks all the time.

    Imagine calling a KERNAL function with a parameter that references a memory area in bank 2 for instance... now if KERNAL needs to access its variable(s) stored in bank 0 for each byte it need to process, that would require at least two bank switches per byte processed. Not a very efficient use of processor time, is it?

  8. As per the Commander X16 documentation on Github:

    This is the allocation of banked RAM in the KERNAL/BASIC environment.

    Bank Description
    0 Used for KERNAL/CBDOS variables and buffers
    1-255 Available to the user

  9. 1 hour ago, svenvandevelde said:

    @GuybrushYes, but the thing is, in lower resolutions the stuff looks ... well ... yeah ... vintage ...

    I don't know what you expected... there's (almost) nothing more vintage than 6502 😛

  10. @svenvandevelde I would just like to say that even the Amiga and Atari ST games were 320*200 or something similar. So were most the PC games well into the 90's. Commander X16 is an 8-bit computer, and those were 16 or even 32-bit.

    Use the 320*240 mode for games (or software in general) with loads of colors and sprites.

    Use the 640*480 mode for GUI applications or  with 2 or 4-bit color.

    If you want more memory, fork the emulator, it should only be a couple of changes.

  11. 1 hour ago, m00dawg said:

    Is there a plan on how the FPGA would be implemented on the final board rev? On revisions 1 and 2, the YM chips socket right onto the board, but an FPGA solution would have to piggyback on both those sockets? Has any thought been given to that and/or using a stackable daughterboard (ala VERA) and just a pin header (or even just putting it on a card)? Or will that just be a future motherboard revision down the road?

    I believe the idea is to use a beefier FPGA to run all (most?) of the chips... CPU, VERA, VIAs, YM, address decoding logic, the whole shebang. That would leave only RAM, ROM and perhaps some glue logic as separate physical chips.

  12. Your RLE format is somewhat hungry for memory space 😀.

    I'd use a [RunLength], [Value] format where RunLength's MSB indicates whether it's followed by a single byte value that is repeated N times, or by N bytes that are only used once.

    So, a data stream that looks like ABCDDDDDDDEFGHIJJJJJJ would be stored as 3,A,B,C,135,D,5,E,F,G,H,I,134,J for a total of 14 values, while your format would be A,1,B,1,C,1,D,7,E,1,F,1,G,1,H,1,I,1,J,6 for a total of 20 values. The difference would be even greater if there are many non-repeated values.

    Of course, this could be extended to indicate, for instance, that what follows is a pattern of M values that should be repeated N times, by using a few bits of the RunLength byte.

    • Like 2

  13. First of all, that chip uses SPI or I2C to communicate with the host computer. That means (in case of Commander X16) that you'd be spending most of your cycles bit-banging its control and data lines through one of the VIAs. I think you'd need something on the order of 100 cycles just to send one byte over. Since one frame (assuming the display is still working at 60Hz) has approx. 133,333 cycles, you'd probably be able to send whopping 1300 bytes during one frame. Maybe it could be faster but still nowhere near the simplicity and speed of writing to a VERA data port.

    Secondly, that chip works with display lists which themselves contain many different graphics primitives (and ones not so primitive):


    Main features of the graphics engine are:

    • The primitive objects supported by the graphics processor are: lines, points, rectangles, bitmaps (comprehensive set of formats), text display, plotting bar graph, edge strips, and line strips, etc.
    • Operations such as stencil test, alpha blending and masking are useful for creating a rich set of effects such as shadows, transitions, reveals, fades and wipes.
    • Anti-aliasing of the primitive objects (except bitmaps) gives a smoothing effect to the viewer.
    • Bitmap transformations enable operations such as translate, scale and rotate.
    • Display pixels are plotted with 1/16th pixel precision.
    • Four levels of graphics states
    • Tag buffer detection

    The graphics engine also supports customized build-in widgets and functionalities such as jpeg decode, screen saver, calibration etc

    Essentially, you get something that performs anti-aliased blending of JPEG files and geometric primitives all on its own. That's not a video controller, that's a GPU. Think of it as a bastard child of Amiga's Copper and an early OpenGL accelerator. And I do think it's a bit over-the-top for this kind of computer.


  14. 7 minutes ago, TheUnknownDad said:

    Ok, so will this mode be supported on the X16? I am currently thinking of developing an expansion card for communications. Moving "large" memory blocks (large in terms of X16) is very CPU intensive and requires many RAM access events.

    Is getting next command from RAM/ROM in the 6502 also tied to the same bus at the same system clock? If so, moving a single byte would involve like about 10-12 clock ticks - pausing the cpu would make IO run at least 10x the speed. Am I right to this? And is this possibly supported for X16 expansion cards? 

    Since the expansion ports (at least to my understanding) have all 16 address lines and both the RDY and BE lines, you should be able to halt the CPU and take over the system, even going as far as replacing the 6502 with another CPU (e.g. a 65816) on your expansion card.

    • Like 1

  15. 16 minutes ago, Cyber said:

    @lamb-duh @TomXP411

    Ok, so I/O latches usually are physically outside the main RAM. We also call these latches "device registers". When we access them programmaticaly we actually don't care how they implemented physically, we just read or write I/O address in memory, thus communicate with device.

    But speaking physically - what are these latches (registers) are? Is it internal part of actual device? Or is it just another separate RAM-like chip somewhere on board? Or it might be one or another?

    In case of VIAs, YM-2151, VERA, the're internal to the chip.

    In case of $00 and $01, they're in separate components on the board, probably in the same bunch as the address decoding logic.

    • Like 1
    • Thanks 1

  16. Yeah, TGA is quite programmer-friendly, but many (most?) TGA files are written bottom to top, which means you'll probably have to reload VERA's address register for every line of the bitmap.

    I remember playing around with Turbo Pascal back in the 90's, writing a TGA image viewer for files created by POV-Ray and being quite surprised when the image finally showed up, only upside-down. Of course, I only had a faint idea what that thing they call the Internet was and you can imagine it was quite difficult obtaining file format specifications. Ahhhh, happy days 🙂

    • Like 1

  17. 17 minutes ago, JimmyDansbo said:

    How did you imagine that the 1bpp masks should be created? I think you would need to outline your png or similar ?

    Well, if you use PNG transparency, it should be relatively easy. Whether your sprite graphics correspond 1:1 to your collision mask is another matter. Suppose you're making a remake of Barbarian or International Karate... sometimes you need to allow some sprite overlap and not register it as a collision, e.g. hand-to-hand collision should be expected and not count as a hit. So, even though you could use sprite graphics to generate the collision mask automatically, there's plenty of situations where that's not practical.

  18. Well, for one, your article is written in such a manner that one might think the computer is already being sold, which it obviously isn't. In multiple places you refer to the computer in the present tense, where the future tense would definitely be more appropriate.

    Wikipedia's general notability guideline (to which the message refers) clearly states that the topic must have "significant coverage in reliable sources that are independent of the subject". This site (or the Facebook group) is definitely not independent of the subject. Since the computer hasn't been released yet, there really can't be any significant coverage in other reliable sources.

    Frankly, this message from the admins really doesn't surprise me, I expected something like this to happen since I've first seen the article, I'm only surprised by their expedience 😛.

    • Like 2
    • Thanks 1
  • Create New...

Important Information

Please review our Terms of Use