Jump to content

StephenHorn

Members
  • Content Count

    168
  • Joined

  • Last visited

  • Days Won

    12

Everything posted by StephenHorn

  1. Michael Steil is probably the guy who needs to accept those PRs. Busy fellow, though.
  2. Many, many 65C02 instructions use absolute addressing. You would have to code your entire program to use relative addresses, ZP addresses, and ZP indirection, and you would still likely have, at the very least, regions of memory that were forbidden because the program needed them for variables. The alternative, as Matt suggests, would be an OS to handle this for you, implementing some kind of virtual page table. I'm not actually sure how that would work without also having a hardware memory controller to aid with the mapping, which the X16 will not have and is not being considered, because c'mon, it's a 64K address space with bank switching on an 8KB window. It'd be like bolting the automatic transmission from a Nissan Sentra onto your 1HP push lawn mower. I'm sure someone is working on that Youtube video, but it's a bit much for the base X16.
  3. So it looks like there are 64x64 sprites. 4bpp? If so, each sprite drawn on a line is 1+8+64 workunits, or 73 work units. So you can expect to draw 10 of these to a line, plus most of an 11th before you'll see tearing (enough that you'll probably only tear the very right-most edge of the 11th). The real question is, why are you not seeing much (much much) more tearing than you do? Other things to keep in mind is that the sprites are not drawn left-to-right, they are drawn in ID order. I'd have to step through the code with a snapshot of VRAM (or simply with your code and assets) to suss out why you're seeing this particular tearing behavior, but in a general sense you are very much overloading the VERA with sprite data per line.
  4. There is a line IRQ feature, and you can scroll tile and text layers, which can produce the effect I believe you're describing. That said, line IRQs are broken on r37. You can build the current source code yourself, though, and get the fix.
  5. "Underflowing" is just what happens when you try to subtract a larger unsigned number from a smaller unsigned number. So, okay, let's suppose you have some 8-bit value, such as %00001011 (which is 11). If I subtract %00001100 (12) from it, I get %11111111 (255), because the math "underflowed". Instead of just stopping at zero, the computer kept going. It's exactly the same thing as when you start with %11111111 (255) and add 1, the computer "overflows" to %00000000. When crossing 0 with a subtraction, it's called an "underflow". And in fact, whenever you see a binary number with the highest bit set, you can also think of it as the negative version of the number you get when you invert all the bits and add 1. So %11111111 (255) can be seen as -1, or %00000000 (0) plus 1. The reason the VERA "starts over at $400" is because the sprite position is only 10 bits wide (see also: VERA reference). $400 is %0100 00000000, in which the lower 10 bits are all zero. So the VERA sees that as %00 00000000. The assembly instructions you'll work with will generally work in 8-bit units instead of 10-bit units, but if you have some pair of bytes %00000000 and %00000000, and subtract %00000001 and %00000000, you'll get %11111111 and %11111111. You can just chop off the top 6 bits from the second byte when you write them to the VERA, because the VERA will only see it as %11 11111111 anyways. Edits: Trying to figure out the best phrasing to teach the concept...
  6. The trick is that even though the X and Y values are technically "unsigned" (i.e. non-negative), you can still get the effect of negative numbers by going ahead and subtracting from 0. The value will "underflow", but you can write the bits to the VERA anyways and it will treat it like a negative number. Another way to look at it is that you've moved the sprite so far off the right side of the screen, that the VERA eventually wraps part of the sprite around to the left edge.
  7. I very likely have front porch/back porch reversed. I'm relatively new to this stuff as well! @Frank van den Hoef would have to answer your question about whether the line counter continues into vblank. The VERA emulation in r37 won't trigger the line IRQ after drawn lines, but if you build your own from the current github code (which has fixed the line IRQs), you can set a line IRQ on the last visible line and get the approximate timing of when the vsync interrupt should occur.
  8. Something else to bear in mind, as well, is that even if you adjust DC_VSCALE to draw the screen as if it's 240 lines tall, the VERA is still operating in 640x480 mode. It's just scaling the internal graphics, not actually changing the resolution and pixel clock.
  9. You should have roughly 11,428 cycles with a full vblank, the timing of the VERA emulation is only giving you around 5,587 cycles. Specifically, the cycles per scanline is 8,000,000 / 525 / 60, or about 253.968 cycles. You should have 45 scanlines' worth of cycles during vblank (525 scanlines/frame, minus 480 visible lines), but the timing issue means you only get 22 scanlines' worth of cycles.
  10. r37 has a minor bug that, for games' sake, we really ought to fix. The VERA emulation counts scanlines starting at the beginning of the VGA front porch, then after the front porch it begins drawing the 480 visible lines, then counts scanlines through the end of the VGA back porch... and only then triggers the vblank interrupt. This is incorrect timing, the VERA hardware will trigger the vblank interrupt at the proper start of vblank. (Also: Crud! I keep forgetting about this.) What this means is that the emulator isn't giving you as much vblank time as you should have. Since line IRQs are also broken on r37, there is presently no workaround. I should check whether there's an issue about this on Github yet... and make one if there isn't. It would be a good first issue for someone interested into contributing.
  11. It could be feasible, especially with the aid of a separate microcontroller to handle the serial communications between the controller and the X16 at the likely spec of 250kHz. You'd also need to provide your own voltage division to step down the +12V line to the vibromotors' expected 7.2V, and I'm not entirely sure how much amperage that line can handle relative to the draw of the vibromotors. One source says the motors pull 300mA steady after a 500mA start-up, which ought to work out to 190mA on the 12V line after a 320mA start-up. You'll also want to be aware of the minor hazard that is the 5V TTL on the X16, versus the 3V3 TTL (apparently) for the PS2 controller. None of this seems especially difficult, it's just a variety of finicky details since the PS2 controller is a good 2.5 generations of computing removed from the X16, even allowing for the generous interpretation that the X16 straddles the 8-bit/16-bit generational divide in the same way that the PC Engine/TurboGrafx-16 did.
  12. Strictly my opinion, but 640x480 sounds ambitious, even considering that you're limiting your 3D to the bottom 320x480. It's a fill rate issue, your 8MHz CPU can only write so quickly, even if we're just talking wireframes. Just keep in mind that 320x480 is 4x as much work as 160x240. Also, I'm assuming you plan to draw to something like a 1bpp bitmap, so your memory requirements are 19KB and you can draw to a bankbuffer and flip between the two. 2bpp is also theoretically possible, costing a reasonable 37.5KB but requiring twice as many writes to the VERA. 4bpp will be untenable, that's 75KB and thus more than half of your VRAM -- this will severely limit your pixel drawing, because you won't be able to fit a backbuffer and will have to either live with seeing the draw process happen over multiple frames or you'll have limit your drawing exclusively to the vblank.
  13. I can't answer all of this, but it would be inappropriate to assign keys on a keyboard to switches that never existed on the SNES controller. My guess is that the emulator only maps the NES equivalent because the NES vs. SNES decision was in limbo for a long time. The SNES controller had Up, Down, Left, and Right. Games that understood the concept of "diagonals" did so by checking for Up+Left, Up+Right, Down+Left, or Down+Right. I don't understand your question about what codes are returned for Ctrl and Alt... if you're polling the joystick, you don't care about scancodes or the like, at all. The joystick interface doesn't get you codes. It gives you those 3 bytes, so if someone pressed Ctrl, it'll set bit 7 on byte 0. If someone presses Alt, it'll set bit 6 on byte 0. If they press both, it'll set both of those. What this means for the SNES controller, however, is that if you press Ctrl, it'll still set bit 7 on byte 0, but this corresponds to the B button, while Alt still sets bit 6 on byte 0, which corresponds to the Y button. The reason why is because of how of the NES and SNES controllers were physically wired, and its this wiring that is what the emulator is emulating. NES and SNES controllers used a compatible wiring standard -- it's possible to plug a NES controller to a SNES, or vice versa, with a passive adapter. A NES will only read 1 byte from the SNES controller, containingg B, Y, Select, Start, Up, Down, Left, Right. But that's okay, the shift register will be re-latched and reset the bit order of buttons before the next polling cycle. A SNES will try to read two bytes from the NES controller, but there's only one shift register on a NES controller with 8 input lines, but that's okay because the NES controller will just output a fixed value on the extra bits when the SNES tries to get them from the shift register, so it'll output A, B, Select, Start, Up, Down, Left, Right, 0, 0, 0, 0, 0, 0, 0, 0. Or if you *really* want to understand the wiring of SNES controllers and NES controllers, I recommend Retro Game Mechanics Explained's video on the subject.
  14. Keep in mind that the latest ROM code may actually be ahead of the emulator's main branch. In particular, a change incoming to the emulator for r38, but that has not happened in the main branch to my knowledge, is to change the himem bank selection from I/O registers $9F60 and $9F61 to zero-page registers $00 and $01. So far, I've been able to work on the emulator in my forks using the vanilla r37 ROM. If you want to work on the ROM specifically, you may want to try pulling x16_board_r2. (Full disclosure: I haven't tried pulling any branches but the master branch, myself, I'm just offering a guess because that branch has seen semi-recent changes. It's entirely possible that Michael Steil has a bunch of code changes in progress for r38 that simply haven't been pushed to Github yet).
  15. @Michael Steil tends to take pull requests in batches as time allows. I don't have a lot of insight into the kinds of things he's actively looking for help with, asides from unit tests, but there is an "issues" section on the github page: https://github.com/commanderx16/x16-emulator/issues I do know that Michael's not particularly interested in further work on optimizing the emulator, unless the gains are somewhere north of a 10% improvement (i.e. he's not interested in small-potatoes optimizations, but will still consider larger wins).
  16. Well, it's much faster to have the assets in himem because you can retrieve them more quickly than SDCard data. The SDCard requires banging away at the protocol for data transfer, which might be assisted somewhat with hardware but might also involve literally byte-banging single bytes through what effectively amounts to a serial port on an I/O address with commands and replies to retrieve blocks of data... or worse, bit-banging a similar interface to the same job but 8X slower. However the kernal ends up implementing it, it will be way slower than accessing himem. There are ways to mitigate the costs. For some assets, if you can preload their entirety into VRAM then you could read straight from the SDCard into VRAM, and skip himem. Fair enough. But again, for the 256x256 tilemap case, you have to store at least some of the data in himem, and at that point you might as well limit the contents in VRAM to a 128x64 window into the data (actually, you could probably go as low as an 81x41 window in 640x480 resolution with 8x8 tiles, and if you're using 320x240 and 16x16 tiles, you could go as low as 21x11... but powers of two make certain maths easier; and since they conveniently fit into the VERA's tilemap sizes, you could just scroll the layers and not have to worry about moving quantities of VRAM around to constantly re-align the tilemap.
  17. When it comes to using up memory, I mean, art assets rule supreme. Want a 256x256 tilemap for your JRPG or platformer? Bam, 128KB of himem right there (256*256 is 64KB, *2 because tilemaps in the VERA are 2 bytes per tile). On both layers? Now 256KB. We haven't even started talking about the pixel data, right? Just the maps. If those tiles have random encounters (for JRPGs), then there's another 64KB for the encounter group table indices. Collision/exits/triggers/other meta? Probably another 64KB. We've consumed anywhere from 1/2 to 3/4ths of the emulator's default system memory and haven't spent a byte on color or sound, or enemies, dialog, behavior... games can really eat through your memory budget in a snap. And there's only so much you can do behind a black screen to preload assets -- you can only keep so much resident in himem, the rest is locked away on that SD card. (And the VERA doesn't make for a great cache... I suppose in theory you could exchange groups of bytes with the VERA as you want to overwrite its contents, but the overhead for this would be relatively high.) You might try a "streaming" type of approach, if you, say, put your game logic in the vblank interrupt and leave the non-interrupted execution to spin through a series of values which determine which files you want in which banks. That assumes that reading from the SD card won't, itself, generate interrupts which could clobber significant time slices of vblank. And you can only read in data so quickly, so you have to be careful of how much data you're potentially needing to stream in at any given time, and possibly continue to make allowances for when your player gets ahead of the streaming process.
  18. No, it'll definitely take up more of the X16's processing power and time, in particular because you can't just write directly to VRAM, you have to write through the VERA's I/O interface, so there's some preamble you have to do whenever you want to move the VERA's "cursor" into VRAM. I would assume that someone wanting to make a raytraced 3D game would start by creating a 320-wide bitmap layer, then setting DC_HSCALE and DC_VSCALE to 64, so they're working at 320x240. That's still 3.3X more pixels than the Gameboy, mind you, but I figure a person could reduce the necessary draw surface further through strategic placement of HUD elements in a tile layer and, if it really comes down to knocking out a few last columns or rows of pixels in order to free up CPU time, you could always reduce the size of the display area with DC_VSTART, DC_VSTOP, DC_HSTART, and DC_HSTOP, and I wouldn't be surprised if a lot of folks just don't notice if you're actually running 300x240, or the like, especially on a VGA display. (Just make sure to leave enough VRAM for a double-buffered display of that bitmap layer... you almost certainly can't do 60Hz so you'll want to draw to a backbuffer and only flip between the two during hblank.)
  19. As far as I know, the number of active layers makes no difference to the sprites' work units. The bits-per-pixel of the sprites, however, might. The emulator currently doesn't account for bpp and as best as I can remember it never has, but there is a comment in the code that reads "one clock per fetched 32 bits", which suggests 8bpp sprites should cost a work unit per 4 pixels' width, while 4bpp sprites should cost a work unit per 8 pixels' width.
  20. As far as I'm aware, it does not move VRAM content, that has never been a part of the emulator's implementation of HSCROLL and VSCROLL. If I were to guess, it bit-shifts HSCROLL and VSCROLL by 7 bits and copies into internal 16-bit accumulators (representing 9.7 fixed point integers). Then as it draws, it increments the accumulators by HSCALE for each column and VSCALE for each row (which are 1.7 fixed point), and decides the "real X, Y" by reading the top 9 bits of the accumulators (the truncated integer portion). Edit: Actually, this may be more than a guess, I seem to recall Frank has previously discussed this on Facebook, on a comment thread where someone wanted to change the basis value for scaling from 128 to 240, so as to provide finer-grained scaling, and suggested a relatively fast division-by-5 hardware design to try and make it possible. This seems to have been ultimately rejected. In fairness, I believe the HSCALE and VSCALE were originally only meant to provide power-of-2 scaling, and the VERA had already grown somewhat beyond its original design by taking on features like the PSG, so LUTs and Frank's own time may be at an extreme premium at this point.
  21. The lesson I learned from my own adventures in case-sensitivity was to just put the filenames in the source file in same case as the filenames will exist on disk or in the zip. The reason being that the emulator's hooks for LOAD() and SAVE() don't bother converting between PETSCII and ASCII, they just take the binary filename in memory at face value and pass it straight to the OS functions.
  22. 8MHz is certainly 8X more processor power than the C64 had, and C64 had Battlezone, Elite, and Sentinel, all of which were 3D games to a certain extent. So I think this would be well within the realm of the possible, especially if you stay with wireframe, but it will be a challenge.
  23. DMA (direct memory access). You're asking for DMA.
  24. Hrm, I'm not sure if that would help. Something I'm not sure I conveyed well is that the sprite collision on the VERA is determined by whether or not it tried to draw two or more sprites over the same pixel. So the work units *really* matter, all the way through. I think there's something, though, to be mined from the approach of precalculating the work units per line for each sprite, so that we have less to calculate on-the-fly later on and can at least zip through the 128 sprites a little more quickly with faster ops when we're confident that there's enough work units to get through the entire sprite. Another thought is that it may be useful to cache precalculated data about sprites, so that we don't have to pay the cost of re-calculating sprite info for looping sprite animations. In my most recent branch for working on performance improvements, I'm doing a lot of backbuffering of layer data, and then caching the backbuffers and layer settings based on a signature for the layer, so this is a similar idea to that. My video_layer_properties caching is pretty naive, though... it's just a linked list of up to 16 elements, but it gets the job done and removes the performance hit of switching between layer settings after the cache has been warmed up. I think a similar approach would be useful for sprites, but I anticipate needing something more like a hash table for storing hundreds of video_sprite_properties structs. There's also an enormous caveat because it's very easy to have to invalidate most, if not all, of the sprite cache when performing a write to VRAM that isn't directly to sprite properties, because otherwise we have to loop through every entry in the sprite cache and individually poke assets. There's gotta be a better way, and I'm wondering if there's a "split the difference" approach that might be possible by doing some precalculation over the entirety of VRAM, but for very few unique combinations of sprite properties. This is currently where my thought processes are at.
×
×
  • Create New...

Important Information

Please review our Terms of Use