Jump to content

Jeffrey

Members
  • Posts

    47
  • Joined

  • Last visited

  • Days Won

    5

Everything posted by Jeffrey

  1. I start measuring time when the 3D part starts (the word "Oxygene" in yellow). Apperently this really pushes the limits of even the emulator: it has to work quite hard ;).
  2. Yes. When audio is turned off, there is no "streaming" going on during the drawing of the polygons. A speed of 1:36:90 is achievable on real hardware. In fact (if I can find the time to implement the newer version) I believe a time of 1:30 (and probably 1:25) is possible. That's 20 fps! In essence (when audio is off): the scene files are loaded at the beginning (into banked ram). The loading right now still uses the kernal LOAD-function and loads from the host (not from the simulated SD). So that part is faster than on real hardware. This loader can however be replaced by an SD-loader when you put it in a real X16. I didn't bother to do this (yet). But thats just initial loading time. The playback is started when all polygon-data is loaded (640kb) into ram. After that there are is no loading going on. Also, I do not "touch" or "prep" the data before starting the playback (thats in the spirit of the competition). More info about the original scene files and competition can be found here: http://arsantica-online.com/st-niccc-competition/ It's actually pretty crazy how much work the 6502 can do when you really keep improving your design: my first version was around 4 minutes ;). Now it does it so much faster. I can probably make an (instructive) video about what process I went through. Specifics: The auto-incrementer from VERA helps quite a lot to speed up the process: it takes only 2 cycles per pixel (= 1 "STA VERA_data0" per 2 pixels) to blit a horizontal line to the buffer. This is where the X16 is faster than other platforms. Of course: packing 2 pixels into 1 byte (and unsetting/setting the incrementer in the mean time) is slower than other platforms (the setup cost for VERA takes quite some time). This is why (on the X16) there isn't that much time difference between 8-bit pixels vs 4-bit pixels. As a sidenote: I "shrink/crop" the screen to 256/200 pixels. All polygon data (x and y coordinates) are between 0 and 255 and fit nicely in a byte. This suits an 8-bit cpu very well. But, VERA still uses a 320 pixel-wide screen buffer (even if you only see 256 pixels horizontally) so to determine the vram-address given an x and y is not very "elegant". Lots of work is done to mitigate the problems that arose from that. In this version I have several lookup tables. Below is my new design of the core loop btw. Its really nuts. It requires many variants of (slightly) different code. Has very intricate jump-tables (with 64k entries!). Switches banks constantly. Uses two ports of VERA etc (in two different ways). But it should be quite a lot faster! Using everything the X16 has got where it helps. Edit: the forum degrades the diagram picture for me. I don't understand why it does that.
  3. Yeah. That time was recorded by hand and hardcoded in the demo It might help if you run it again: the loading of the audio files can slow it down too. If you run it again, the files are probably in cache.
  4. This version uses 16 colors per pixel. The 8-bit version is a little slower, but not by much. The cost of packing 2 pixels per byte is quite high and in 256c mode you can "re-use" colors (by "rotating" the palette) so you don't have to clear the screen that much. Attached is my (technical) design for the version I just released. It probably requires extra explanation, but it might give you an idea about the structure of the polygon drawing routine. BTW: like Oziphanto explained: this demo (the original and this one) is drawing 2D-polygons. Its not really doing any real 3D math. So the demo shows how fast you can draw on a machine, not how fast you can do 3D math.
  5. Version 1.0.0

    409 downloads

    This is the release of a STNICCC Demo Remake for the Commander X16! I have been (silently) working on this for the last couple of weeks/months. It is time to release it :). Let's just say: the Commander X16 is far more powerful than I had thought! Here is a video of it running: Enjoy! Regards, Jeffrey --- PS. There was an earlier attempt to remake this demo on the X16 (done by Oziphanto on youtube). Oziphanto did a very nice comparison video of the X16 with several other machines of the 8-bit and 16-era: He also re-created this demo, but (in my opinion) did not do such a good job extracting everything out of the X16: his demo ran in 2:32. The remake I made does it in 1:39! His benchmark comparison should therefore be updated: Keep in mind the Commander X16 only has: - An 8-bit 6502 cpu (8MHz) - No DMA - No Blitter Yet it keeps up with 16-bit machines like the Amiga! (actually its even faster right now) --- Extra notes: - This only works on the x16 emulator with 2MB of RAM - It uses the original data (but its split into 8kb blocks, so it can fit into banked ram) - Waaaayyy to much time is spend on the core-loop to make it perform *this* fast! - My estimate is that it can be improved by another 10-15 seconds (I have a design ready, but it requires a re-write of the core-loop) - It uses a "stream" of audio-file data and produces 24Khz mono sound (this will not work on the real x16, since loading the files that fast is a feature of the emulator only) Here is a version without audio (so this should run on a real x16): And it runs even faster (1:36:90)
  6. STNICCC Commander X16 Demo Remake View File This is the release of a STNICCC Demo Remake for the Commander X16! I have been (silently) working on this for the last couple of weeks/months. It is time to release it :). Let's just say: the Commander X16 is far more powerful than I had thought! Here is a video of it running: Enjoy! Regards, Jeffrey --- PS. There was an earlier attempt to remake this demo on the X16 (done by Oziphanto on youtube). Oziphanto did a very nice comparison video of the X16 with several other machines of the 8-bit and 16-era: He also re-created this demo, but (in my opinion) did not do such a good job extracting everything out of the X16: his demo ran in 2:32. The remake I made does it in 1:39! His benchmark comparison should therefore be updated: Keep in mind the Commander X16 only has: - An 8-bit 6502 cpu (8MHz) - No DMA - No Blitter Yet it keeps up with 16-bit machines like the Amiga! (actually its even faster right now) --- Extra notes: - This only works on the x16 emulator with 2MB of RAM - It uses the original data (but its split into 8kb blocks, so it can fit into banked ram) - Waaaayyy to much time is spend on the core-loop to make it perform *this* fast! - My estimate is that it can be improved by another 10-15 seconds (I have a design ready, but it requires a re-write of the core-loop) - It uses a "stream" of audio-file data and produces 24Khz mono sound (this will not work on the real x16, since loading the files that fast is a feature of the emulator only) Here is a version without audio (so this should run on a real x16): And it runs even faster (1:36:90) Submitter Jeffrey Submitted 05/21/21 Category Demos  
  7. Just a little personal update: I have been very busy lately IRL. But I will be returning to this project. Also, the last few days/weeks I have been working on a (completely) new demo. And I am very excited about it :). Lets just say that the x16 is much more capable than I had thought...
    Wow! Very cool demo! How did you do this?
  8. Sounds like a good plan. I first wrote the demo in c as well, since that is much easier to understand and to debug. I will have to experiment a lot with your code to understand how it works and how I can incorporate it into the Wolf3D game/demo. I also plan to completely port the Wold3D demo into assembly, because the c-code takes up quite a lot of ram and handling both c and asm is quite inconventient. I am accustomed to php and python so that should be no problem. I think what you are doing will be used by many others in the future. I hope the Wold3D project is just the beginning... Hopefully next week I will have some more time to spend on all of this. Currently doing lots of work on our house Regards, Jeffrey
  9. OMG!! This is soo cool! Kudos for making this work so well already. Regards, Jeffrey
  10. I think you are looking at the compiled C code. I have a assembly version in the .asm which contains the asm version. The c version is just for testing purposes. Edit: in fact: I call the asm version from c in different ways (for easier debugging). And some of the c-code I didn't convert yet to assembly because its not very performance critical atm (like drawing the menu once or clearing the render part once).
  11. Thanks! I think this a sound idea. Maybe a binary search approach would go well with this. Not sure how much overhead vs gains would be with this technique. Will have to investigate.
  12. Cool!! Thanks in advance! It would be so cool to have some music.
  13. Short update: Now running at 10+ fps The optimizations I did (on top of the previous ones that got me to 7.5 fps) are: inlining the fast multipliers, so less copying of values, no jsr and rts re-using the somewhat "static" parts of the multiplications, so it won't be re-loaded each time (this was harder than it sounds, quite of bit of refactoring done) Cosine and Sine fractions are player-related, and even though they are negated sometimes, they (that is their squares) could be reused for (almost) each ray The (square of the) fraction of the tile the player is standing in -to be used for calculating the initial x/y interception for each ray- could be reused cleaned up the main loop and several other parts replaced the 16-bit slow divider with a 512-entry table: distance2height (major improvement!!) I am quite happy with the speed. The demo plays quite nicely now.
  14. Ah right. I think its around 40 directions. But im not sure. I just took Wolf3D and tried to turn around. It sounds like a lot of data you want to store, but I havent done the math on it. Right now I spend a lot of RAM on the generated texture-draw code.
  15. There are 304 ray directions in a single screen. The FOV is 60 degrees. So there are 6*304 possible ray directions. For tan and invtan i use tables the size of a quarter of that. I will have to look into their code in more detail to figure out what they do exactly regarding the resolution/angles/aspect ratios etc. Edit: when looking at their code it seems they use 3600 possible "fine" directions (for their tan-table lookup). Which is about double what I use.
  16. The ray direction is simply an index: for every possible angle I have an index. This is used for lookup in sine, cosine, tan and invtan tables. The ray starting position is a 16 bit position (tile pos + pos in tile). And i use the same kind of variables like Wolf3D for stepping/intercepting. So that part is pretty similar I think. But my 286 is also very rusty I used the video to inform me about how Id did it and figured out a way to implement on the 6502/X16/Vera.
  17. I essentially use 2 byte fixed notation. I use one byte for the tile position and one byte for the position within a tile. For multiplication of a 8-bit fraction (like sine and cosine) and a 16-position (tile + position within it) I temporarily extend to 24 bits and keep only the highest 16 bits. All unsigned numbers. In order to prevent negative numbers as much as possible I normalize every ray direction as if it was within the frist 90 degrees. I currently have 4 maps in memory to accomodate for that. From what I understand from the video mentioned earlier Id used 2 bytes for both the tile position and the position within the tile (so double what I use). But the precision I get seems enough for now.
  18. Short update The latest release (version 1.2.0) ran at 5 fps. My local version now runs at 7.5 fps Around a 50% gain in speed. I'm currently optimizing the dda-algorithm. So far I did two things: Using zero page addresses for pretty much all my variables (10% gain) Now I use fast multipliers that use "square" tables: https://codebase64.org/doku.php?id=base:seriously_fast_multiplication (40% gain) Still more speed to be gained
  19. Thanks for the idea. I'm pretty sure though that doing floating point math on a 6502 CPU (for this purpose) is slower than doing fixed point processing. The 6502 is not well optimized for doing floating point math. Floating point numbers are (in general) probably easier to deal with, since handling fixed point 8 or 16 bit numbers means a lot off fiddling to make everything fit and not "break". Floating point numbers have a real advantage when it comes to conveniency. And if the CPU (or GPU) is hardware-optimized for it, its most definitely the better solution.
  20. Fascinating stuff. I would like to dive a little deeper into this when I have some time. It must be an interesting experience, this project right?
  21. Where did you find the originel sound files? Do hou have and tips to rest about these kinds of FM systems/chips? I sound probably use some help on the sound/music front
×
×
  • Create New...

Important Information

Please review our Terms of Use