Jump to content


  • Content Count

  • Joined

  • Last visited

  • Days Won


Everything posted by Jeffrey

  1. Sounds like a good plan. I first wrote the demo in c as well, since that is much easier to understand and to debug. I will have to experiment a lot with your code to understand how it works and how I can incorporate it into the Wolf3D game/demo. I also plan to completely port the Wold3D demo into assembly, because the c-code takes up quite a lot of ram and handling both c and asm is quite inconventient. I am accustomed to php and python so that should be no problem. I think what you are doing will be used by many others in the future. I hope the Wold3D project is just the beginning... Hopefully next week I will have some more time to spend on all of this. Currently doing lots of work on our house Regards, Jeffrey
  2. OMG!! This is soo cool! Kudos for making this work so well already. Regards, Jeffrey
  3. I think you are looking at the compiled C code. I have a assembly version in the .asm which contains the asm version. The c version is just for testing purposes. Edit: in fact: I call the asm version from c in different ways (for easier debugging). And some of the c-code I didn't convert yet to assembly because its not very performance critical atm (like drawing the menu once or clearing the render part once).
  4. FYI: the next couple of days I will be very busy IRL
  5. Thanks! I think this a sound idea. Maybe a binary search approach would go well with this. Not sure how much overhead vs gains would be with this technique. Will have to investigate.
  6. Cool!! Thanks in advance! It would be so cool to have some music.
  7. Short update: Now running at 10+ fps The optimizations I did (on top of the previous ones that got me to 7.5 fps) are: inlining the fast multipliers, so less copying of values, no jsr and rts re-using the somewhat "static" parts of the multiplications, so it won't be re-loaded each time (this was harder than it sounds, quite of bit of refactoring done) Cosine and Sine fractions are player-related, and even though they are negated sometimes, they (that is their squares) could be reused for (almost) each ray The (square of the) fraction of the tile the player is standing in -to be used for calculating the initial x/y interception for each ray- could be reused cleaned up the main loop and several other parts replaced the 16-bit slow divider with a 512-entry table: distance2height (major improvement!!) I am quite happy with the speed. The demo plays quite nicely now.
  8. Ah right. I think its around 40 directions. But im not sure. I just took Wolf3D and tried to turn around. It sounds like a lot of data you want to store, but I havent done the math on it. Right now I spend a lot of RAM on the generated texture-draw code.
  9. There are 304 ray directions in a single screen. The FOV is 60 degrees. So there are 6*304 possible ray directions. For tan and invtan i use tables the size of a quarter of that. I will have to look into their code in more detail to figure out what they do exactly regarding the resolution/angles/aspect ratios etc. Edit: when looking at their code it seems they use 3600 possible "fine" directions (for their tan-table lookup). Which is about double what I use.
  10. The ray direction is simply an index: for every possible angle I have an index. This is used for lookup in sine, cosine, tan and invtan tables. The ray starting position is a 16 bit position (tile pos + pos in tile). And i use the same kind of variables like Wolf3D for stepping/intercepting. So that part is pretty similar I think. But my 286 is also very rusty I used the video to inform me about how Id did it and figured out a way to implement on the 6502/X16/Vera.
  11. I essentially use 2 byte fixed notation. I use one byte for the tile position and one byte for the position within a tile. For multiplication of a 8-bit fraction (like sine and cosine) and a 16-position (tile + position within it) I temporarily extend to 24 bits and keep only the highest 16 bits. All unsigned numbers. In order to prevent negative numbers as much as possible I normalize every ray direction as if it was within the frist 90 degrees. I currently have 4 maps in memory to accomodate for that. From what I understand from the video mentioned earlier Id used 2 bytes for both the tile position and the position within the tile (so double what I use). But the precision I get seems enough for now.
  12. Short update The latest release (version 1.2.0) ran at 5 fps. My local version now runs at 7.5 fps Around a 50% gain in speed. I'm currently optimizing the dda-algorithm. So far I did two things: Using zero page addresses for pretty much all my variables (10% gain) Now I use fast multipliers that use "square" tables: https://codebase64.org/doku.php?id=base:seriously_fast_multiplication (40% gain) Still more speed to be gained
  13. Thanks for the idea. I'm pretty sure though that doing floating point math on a 6502 CPU (for this purpose) is slower than doing fixed point processing. The 6502 is not well optimized for doing floating point math. Floating point numbers are (in general) probably easier to deal with, since handling fixed point 8 or 16 bit numbers means a lot off fiddling to make everything fit and not "break". Floating point numbers have a real advantage when it comes to conveniency. And if the CPU (or GPU) is hardware-optimized for it, its most definitely the better solution.
  14. Fascinating stuff. I would like to dive a little deeper into this when I have some time. It must be an interesting experience, this project right?
  15. Where did you find the originel sound files? Do hou have and tips to rest about these kinds of FM systems/chips? I sound probably use some help on the sound/music front
  16. Thanks. That would be too much ram usage. Right now the longest routine is 64 reads and 182 writes. So thats (64+182)*3 bytes. So I reserve 1kb per routine now (for fast access to the routines). That covers all of banked ram right now. I can probably pack it better, but 512 roitines times the amount of textures would be way too much memory. Another way to reduce (dummy) reads is to have (smaller) textures in normal memory (but "striped" pixel by pixel) and simply hard code the needed read addresses (with an x-index for the texture index) in each routine for all walls smaller than the texture height. For example: LDA $5603, X STA VERA_DATA0 LDA $5719, X STA VERA_DATA0 ... Where X contains the texture index.
  17. What I am noticing is that the raytracing itself (the dda algorithm) is right now around twice as expensive as the blitting to the screen. So halfing vertically won't give you much speed improvement. Halving horizontally would certainly help. What is more interesting right now is how to do the dda-algorithm quickly in assembly. Right now, it does for each ray: a tan() and inverse-tan() lookup resulting in two 16-bit values (x_step and y_step) two multiplications for determining the initial intersection points inside the cell you are standing (x_intercept and y_intercept) quite a lot of branches to implement the logic used by the dda-algorithm (including copying 16-bit numbers) several decrementers, incrementers, subtractions and additions of 16 bit numbers bit-shifters to do a lookup in the world-map table(s) two multiplications of an 16-bit and 8 bit value (using x, y distance and cos/sin to get to the distance from the camera plane) a divide of a 16-bit value (the distance to the wall) by a 16-bit constant resulting in a wall height (16-bit) --> expensive! (want to use lookup tables) a capping of the wall height (16-bit) into a render height (byte) lots a small little details For setting up the rays I also change the input so that I only have to do the logic for one quadrant. My gut feeling is that the above should take (maybe) several hunderds of cycles. Maybe 300-400? So 304 rays * 300-400 cycles = 90,000 to 120,000 cycles. So maybe 1 tick. Yet it is spending about 7-8 ticks now. So much room for improvement I think. Basicly I implement the logic described in this video: It would be cool if we could iterate together by suggesting / showing to each other what example assembly snippets would be faster in order to bring down the cycle count needed for this algorithm. First I have to release though. So back to doing some (much needed) cleanup again
  18. Its starting to work in assembly: Right now its about 5fps. Can be improved quite a bit. I'll have to do some cleaning up and I would like to add some small features (maybe more than one texture). Then I'll probably release a new version. Have fun!
  19. Yes. Those sort of shortcuts will help (although I am starting now with how Wolf 3D did the raycastings). A more fundamental problem is the cycle budget: blitting 304*152 pixels using 8 cycles per pixel takes around 370k cycles. Per vsync tick we only have 133k cycles available. So 60fps seems out of reach for the x16. Thats without doing any raycasting/space partitioning, sprite scaling, AI etc. But thats ok. Im fine with 10-15fps.
  20. I doubt it. The 286 was 16 bit and apparently they capable of writing 2 pixels at the time to VGA. They reached 60 fps i believe. We could maybe reach 15 fps (when I do some hand calculations). And thats hard.
  21. Yes. I do that now. I generate 512 routines for 512 possible heights. See my previous posts about it.
  22. Most textures are not vertically mirrored. Some might me though. Good idea to exploit that where possible. My goal for now (the demo) is to have the same look and feel as Wolfenstein 3D. And see how fast that runs on the x16 (and first optimizing for that). So I am using the original 64x64 textures for now and keeping with the original screen resolution. Then see how fast we can get that. If performance is (too) low, halving the vertical screen resolution (and maybe the texture resultion) is certainly an option. Other tricks are possible as well. But those are compromises to be made later on I think.
  23. That is a good idea. Right now I might have enough room for two buffers (which I want to use to prevent shearing), but I won't have a lot of room for many textures (that are also in VRAM and take around 4k each). Stretching in the vertical axis will alleviate the VRAM problems and it might not look so bad. Will have to look into it. Might be a good option. Right now, I am still handwriting all performance critical functions into assembly. Got it mostly ported, but still some leftovers in c (which still drain the speed). I am struggling a bit with the last pieces, but I think I will get a working version (in assembly) pretty soon. I've had many weird bugs and investigating them is quite a challenge and time consuming... Regards, Jeffrey
  • Create New...

Important Information

Please review our Terms of Use