Jump to content
Jeffrey

New demo uploaded: Wolfenstein 3D - raycasting demo with textures

Recommended Posts

Wolfenstein 3D - raycasting demo with textures

View File

Raycaster demo (written in c and assembly)

This version is a big improvement in performance: it now runs at 10+ fps! 🙂

I am quite happy with the speed. The demo plays quite nicely now.

See below description of version 1.3.0 for details.

---
This is a raycaster demo written in c and assembly for the Commander x16. I've been trying out the x16 and thought it would be a cool idea to start working on a raycaster.

Ultimately it would be great if "Wolfenstein 3D" can somehow be ported to the x16. That would be a BIG challenge though! 😉

This is how it looks now:

raycasting_demo_1.3.0a.gif

Speed improvements in 1.3.0:

Now running at 10+ fps! 🙂

- Using zero page addresses for pretty much all my variables
- Using fast multipliers that use "square" tables: https://codebase64.org/doku.php?id=base:seriously_fast_multiplication
- Inlined the fast multipliers, so less copying of values, no jsr and rts
- Re-using the somewhat "static" parts of the multiplications, so it won't be re-loaded/calculate each ray (this was harder than it sounds, quite of bit of refactoring done)
    - Cosine and Sine fractions are player-related, and even though they are negated sometimes, they (that is their squares) could be reused for (almost) each ray
    - The (square of the) fraction of the tile the player is standing in -to be used for calculating the initial x/y interception for each ray- could be reused
- Cleaned up the main loop and several other parts
- Replaced the 16-bit slow divider with a 512-entry table: distance2height (major improvement!!) 🙂
 

New in this version 1.2.0:

- draw functions have been ported to assembly. Much faster!
- dda casting functions have been ported to assembly. Much faster!
- drawing textures!
- automatic generation of routine to draw ceiling and floor (only 4 cycles per plain pixel)
- automatic generation of around 512 routines for drawing textures (only 8 cycles per textures pixel)
- using joystick controls (you can use arrow keys and alt to control, you can hold down keys)
- a few textures from Wolfenstein 3D (shareware version) have been added (loaded at startup)
- changed the map to look like the first level in Wolfenstein 3D
- added a border around the render area, just like Wolfenstein 3D

Usage

Unpack the zip. Make sure you have the .BIN files in the same folder as the source files.

To compile: (this assumes cc65 is installed)

    cl65 -t cx16 -o RAY.PRG ray.c ray_asm.asm -O3

To run:

    x16emu.exe -prg RAY.PRG -run

To play:

    up - move forwards
    down - move backwards
    left - turn left
    right - turn right
    alt-left - strafe left
    alt-right - strafe right

To debug:
    p - turn on log screen
    o - turn off log screen
    t - turn on test ray
    l - rotate test ray left
    j - rotate test ray right

Known issues (1.2.0)

- Sometimes there is a corner of a wall not drawn correctly
- Since there is no real V-sync you get "shearing" in the screen (requires double buffering)

 Next up:
- Lots of speed improvements
- Lots of cleanup of code (its messy now)
- Add a time-per-frame indicator (using a vsync interrupt counter)
- Mooaarr wall-textures!
- Double buffer to prevent shearing
- Show a map on the screen
- Bigger map (limit is now 16x16)
- Opening doors
- Add (scaled) "sprites"
- Lamps (scaled) hanging from ceiling
- Stats in lower part, gun visible
- AI, enemies

Having fun! 🙂

Jeffrey


 

Edited by Jeffrey
  • Like 7

Share this post


Link to post
Share on other sites

This is really cool! I'd really be curious how fast this can get.  Skimming the assembly CC65 generates, I see a lot of JSR's.  So much branching certainly would slow it down. It should be interesting to see how much faster it could be written directly in assembly.  Compiling with -O is noticeably faster as well.

As a fun sidenote, running this with the emulator in warp mode, it's practically a playable game 😅

  • Like 1

Share this post


Link to post
Share on other sites
Posted (edited)

Trying to draw some textures:

textures_poc.png

This is just in PoC now. Its very slow. But I now know how to make it work in principle.

Now I somehow have to make this perform... 😉

Edited by Jeffrey
  • Like 4
  • Thanks 1

Share this post


Link to post
Share on other sites

Have you heard of the YT channel "One Lone Coder?" His stuff is all in C++ using his own custom game engine, but he covers a lot of relevant topics and I've learned a lot watching them anyway. He did a video about a fast raycast algorithm which would fit this application. Here's a Link to the video. Not sure if you've already seen it or whether your code essentially does this already, but it might be useful for you.

Share this post


Link to post
Share on other sites
1 hour ago, ZeroByte said:

Have you heard of the YT channel "One Lone Coder?" His stuff is all in C++ using his own custom game engine, but he covers a lot of relevant topics and I've learned a lot watching them anyway. He did a video about a fast raycast algorithm which would fit this application. Here's a Link to the video. Not sure if you've already seen it or whether your code essentially does this already, but it might be useful for you.

Thanks! 🙂

Yes I already saw his video. In fact I am subscribed to his channel 😉. He always creates very instructive videos about how to do things from scratch. He is an inspiration.

For this topic he refers to this blog: https://lodev.org/cgtutor/raycasting.html

That technique however uses a somewhat diffrerent technique than the original Wolfenstein did: the original technique is (I think, but I could be wrong) better suited if you only have 8-16 bit fixed point numbers (and not floating point numbers). 

Here is a video about the original Wolfenstein 3D technique that was very useful to me:

It's fascinating when diving into this stuff: how brilliant John Carmack was to figure this stuff out by himself in those days.

Regards,

Jeffrey 

PS. In order to speed up drawing (massively) I am now also generating "hardcoded" machine code for drawing each possible wall-height. Pretty crazy, but it really works well 🙂

Share this post


Link to post
Share on other sites
37 minutes ago, Jeffrey said:

generating "hardcoded" machine code for drawing each possible wall-height.

I'm fairly sure the original wolfenstein did the same thing.  (or maybe Doom did) 🙂

  • Like 1

Share this post


Link to post
Share on other sites

 

1 hour ago, desertfish said:

I'm fairly sure the original wolfenstein did the same thing.  (or maybe Doom did) 🙂

Probably Wolf3d. Doom's render used multiple passes for its render. I found a video once that showed the render in slow-motion so you could sit there and watch it draw the frame.

2 hours ago, Jeffrey said:

PS. In order to speed up drawing (massively) I am now also generating "hardcoded" machine code for drawing each possible wall-height. Pretty crazy, but it really works well 🙂

Hey, whatever it takes to get it to run - I'm really impressed with the progress thus far, man!

I've personally been pondering what I would do if I wanted to port Wolf3d to the X16 - and I'm thinking a speed optimization would be to cut the resolution in half and then use the VERA scaling to scale it back up to full screen, or maybe even 1/4 res upscaled 4x.... i.e. only draw in the upper-left 1/4 or 1/8 of the video memory area.

You could then use a raster IRQ to set VERA back to 1:1 scaling when it reaches the top row of the HUD, and render the HUD on layer1 using tiles.

Share this post


Link to post
Share on other sites
Posted (edited)
9 hours ago, desertfish said:

I'm fairly sure the original wolfenstein did the same thing.  (or maybe Doom did) 🙂

Indeed. Thats where I got the idea from. 🙂

image.png.27361eed05179f0bc8511e4cd6b987b4.png

The above mentioned video says: "As there is only a finite number of possible heights, Wolfenstein code generates one routine for every possible height".

Right now I store the textures in VRAM. When I generate such a routine its simply looks something like this:

Quote

LDA VERA_DATA1

STA VERA_DATA0

STA VERA_DATA0

LDA VERA_DATA1

STA VERA_DATA0

LDA VERA_DATA1

STA VERA_DATA0

STA VERA_DATA0

...

In that example it writes about 2-3 times more to the screen than it reads from the texture (which is 64x64 pixels). The nice thing about VERA is that you can do this vertically, which suits drawing columns for each ray very well.

This takes less than 8 cycles per pixel. Sometimes you read more than you write (when walls are smaller than the textures). Sometimes you write more than you read (when the walls are taller than the texture). All in all a little less than 8 cycles. I still need to optimize the smaller walls (as they dummy-load too much right now, so I probably need a secondary, smaller texture or double my stride).

For the ceiling and floor I simply have a  single routine with a whole bunch to STA VERA_DATA0's and I jump in that routine at exactly the right place (with the correct color in A). So those take only 4 cycles per pixel.

I can still speed that up a little by remembering how tall the wall (on that column) was the previous frame, so I only have to remove that old wall  and not redraw the entire ceiling or floor.

Fun stuff! 🙂 

 

Edited by Jeffrey

Share this post


Link to post
Share on other sites
Posted (edited)
7 hours ago, ZeroByte said:

Hey, whatever it takes to get it to run - I'm really impressed with the progress thus far, man!

I've personally been pondering what I would do if I wanted to port Wolf3d to the X16 - and I'm thinking a speed optimization would be to cut the resolution in half and then use the VERA scaling to scale it back up to full screen, or maybe even 1/4 res upscaled 4x.... i.e. only draw in the upper-left 1/4 or 1/8 of the video memory area.

You could then use a raster IRQ to set VERA back to 1:1 scaling when it reaches the top row of the HUD, and render the HUD on layer1 using tiles.

Thanks. Its a lot of fun doing it! 🙂

Yeah. Right now I estimate that a speed of 10-15 fps is achievable with a resolution of 304x152 pixels for the 3D-rendering part, which is the resolution the original Wolfenstein 3D shareware version had (the NTSC version). The PAL version has vertical 182 pixels.

When you want more than 15 fps the most effective way to do that (I think) would be to scale in width: making the vertical bars look like 4 pixels wide (instead of 1). It's only a quarter of the work and would likely do the trick (achieving 50-60 fps). And might still be nice to look at. The sprites might look weird though. Not sure.

Currently I am handwriting the ray-casting routines (DDA) in assembly. Quite hard and time consuming. Because that is now the bottleneck. If that works we'll see what kind of speed comes out of that.

Of course all of the above does not factor in the game itself, just the engine. So it will probably be a bit slower. But the goal of this demo is to see if a raycasting engine is achievable at reasonable speeds on the X16.

Having fun just thinking about this problem! 🙂

PS. earlier I was thinking of using 100+ sprites and showing them multiple times in a single frame (like I did in the vertical bar scroller) which would effectively create around 500 sprites on screen and then placing those sprites very carefully on the screen so 4 of them (stacked vertically) would effectively create a verticall bar (with a texture) that would represent a cast ray and would be 3 or 4 pixels wide. That would me much faster (than blitting to the screen), but I've abandoned that idea because it doesn't seem to work with multiple textures.

Edited by Jeffrey

Share this post


Link to post
Share on other sites
13 hours ago, Jeffrey said:

Having fun just thinking about this problem!

I know exactly what you mean. I'm the same way.

It's a pity that VERA doesn't have sprite scaling functionality (other than that they get scaled along with everything if you scale the display itself) - if it did, then the sprites in Wolf3d would be a cakewalk.

  • Like 1

Share this post


Link to post
Share on other sites
On 3/8/2021 at 9:20 PM, Jeffrey said:

Indeed. Thats where I got the idea from. 🙂

image.png.27361eed05179f0bc8511e4cd6b987b4.png

The above mentioned video says: "As there is only a finite number of possible heights, Wolfenstein code generates one routine for every possible height".

Right now I store the textures in VRAM. When I generate such a routine its simply looks something like this:

In that example it writes about 2-3 times more to the screen than it reads from the texture (which is 64x64 pixels). The nice thing about VERA is that you can do this vertically, which suits drawing columns for each ray very well.

This takes less than 8 cycles per pixel. Sometimes you read more than you write (when walls are smaller than the textures). Sometimes you write more than you read (when the walls are taller than the texture). All in all a little less than 8 cycles. I still need to optimize the smaller walls (as they dummy-load too much right now, so I probably need a secondary, smaller texture or double my stride).

For the ceiling and floor I simply have a  single routine with a whole bunch to STA VERA_DATA0's and I jump in that routine at exactly the right place (with the correct color in A). So those take only 4 cycles per pixel.

I can still speed that up a little by remembering how tall the wall (on that column) was the previous frame, so I only have to remove that old wall  and not redraw the entire ceiling or floor.

Fun stuff! 🙂 

 

You might want to check out the Coding Secrets channel on YouTube.  That guy did a lot of games for the Sega (ie Sonic, Toy Story 3D) and goes through the code he wrote to do all the 3d effects.

  • Like 1

Share this post


Link to post
Share on other sites
On 3/8/2021 at 11:20 PM, Jeffrey said:

For the ceiling and floor I simply have a  single routine with a whole bunch to STA VERA_DATA0's and I jump in that routine at exactly the right place (with the correct color in A).

Considering that the ceiling and floor are just static colors, would it be possible to use either another layer or a raster interrupt halfway down the screen to create this effect?

  • Like 1

Share this post


Link to post
Share on other sites
7 hours ago, Elektron72 said:

Considering that the ceiling and floor are just static colors, would it be possible to use either another layer or a raster interrupt halfway down the screen to create this effect?

You'd still have to do some cleanup with the transparent pixels wherever a wall changes sizes between frames.

Sometimes the added overhead of checking for things is slower than just blitting a bunch of pixels. I ran into this when making my C64 version of Flappy Bird - I tried 2 different methods of drawing the pipes whenever the scroll register loops: go to each column where pipes are located and draw them in with one extra blank tile to the right of each pipe vs. just blit the entire screen one tile to the left and then draw the final column. It's less writes to do it the first way, but the extra logic to go through the list of pipe columns, load the height of each set of pipes, and write the different tiles at the proper heights actually ended up taking more raster time than the dumb blit routine. It even blits into the rightmost column which is about to be overwritten by the new column of tiles being scrolled in - it's faster to just "durrrr blit!" and then go overwrite one column than it is to put in code to skip every 40th tile.

If I were going to use 2 bitmap layers, I would probably do the walls in a layer and the actors in a layer (they're not really "sprites" are they?) - because you can actually do a few optimizations this way - such as not re-drawing the walls if the camera hasn't moved, and only redrawing the columns that have actors in them. (enemies still move and animate even if you're not moving the camera). Then I'd just use actual sprites for the gun you're holding, so animating that is nothing but updating the sprite pointers for the animations.

Finally, a raster IRQ to switch back to tile mode on layer1 for the HUD, and you're set.

  • Like 1

Share this post


Link to post
Share on other sites
Posted (edited)
11 hours ago, Elektron72 said:

Considering that the ceiling and floor are just static colors, would it be possible to use either another layer or a raster interrupt halfway down the screen to create this effect?

Yes it would be possible. But as ZeroByte said, you would still have to clear parts of the walls that were drawn the previous frame and thats just as expensive as writing a single color to it. 

It would be beneficial if you only had to draw a single picture (given an empty buffer), but we have to draw over the same buffer again and again. If there was a magic "clear entire video buffer" command then it would for sure help.   

Edited by Jeffrey
  • Like 1

Share this post


Link to post
Share on other sites
Posted (edited)
9 hours ago, Jeffrey said:

Yes it would be possible. But as ZeroByte said, you would still have to clear parts of the walls that were drawn the previous frame and thats just as expensive as writing a single color to it. 

It would be beneficial if you only had to draw a single picture (given an empty buffer), but we have to draw over the same buffer again and again. If there was a magic "clear entire video buffer" command then it would for sure help.   

How much space have you allocated for the buffer? Do you absolutely have to have square pixels,  or can you get away with stretching the screen vertically with VSCALE and draw the same number of columns but fewer pixels per column? Might be able to get two frames in memory at the same time that you can flip between... definitely if you stretch the vertical scale to half the horizontal scale. 

Edited by Ed Minchau
  • Like 1

Share this post


Link to post
Share on other sites
28 minutes ago, Ed Minchau said:

How much space have you allocated for the buffer? Do you absolutely have to have square pixels,  or can you get away with stretching the screen vertically with VSCALE and draw the same number of columns but fewer pixels per column? Might be able to get two frames in memory at the same time that you can flip between... definitely if you stretch the vertical scale to half the horizontal scale. 

That is a good idea. 🙂 Right now I might have enough room for two buffers (which I want to use to prevent shearing), but I won't have a lot of room for many textures (that are also in VRAM and take around 4k each). Stretching in the vertical axis will alleviate the VRAM problems and it might not look so bad. Will have to look into it. Might be a good option. 

Right now, I am still handwriting all performance critical functions into assembly. Got it mostly ported, but still some leftovers in c (which still drain the speed). I am struggling a bit with the last pieces, but I think I will get a working version (in assembly) pretty soon. I've had many weird bugs and investigating them is quite a challenge and time consuming...

Regards,

Jeffrey

Share this post


Link to post
Share on other sites
3 hours ago, Jeffrey said:

That is a good idea. 🙂 Right now I might have enough room for two buffers (which I want to use to prevent shearing), but I won't have a lot of room for many textures (that are also in VRAM and take around 4k each). Stretching in the vertical axis will alleviate the VRAM problems and it might not look so bad. Will have to look into it. Might be a good option. 

Right now, I am still handwriting all performance critical functions into assembly. Got it mostly ported, but still some leftovers in c (which still drain the speed). I am struggling a bit with the last pieces, but I think I will get a working version (in assembly) pretty soon. I've had many weird bugs and investigating them is quite a challenge and time consuming...

 

64x64 textures? They don't have to be. They can be anything you want,  since they're not being directly displayed. And if you go with half the vertical resolution on the screen you can cut those in half too. Are the top half of the textures mirror images of the bottom half? You can count down as well as up, so that could cut the textures in half as well. 

Share this post


Link to post
Share on other sites
9 hours ago, Ed Minchau said:

64x64 textures? They don't have to be. They can be anything you want,  since they're not being directly displayed. And if you go with half the vertical resolution on the screen you can cut those in half too. Are the top half of the textures mirror images of the bottom half? You can count down as well as up, so that could cut the textures in half as well. 

Most textures are not vertically mirrored. Some might me though. Good idea to exploit that where possible. 🙂

My goal for now (the demo) is to have the same look and feel as Wolfenstein 3D. And see how fast that runs on the x16 (and first optimizing for that). So I am using the original 64x64 textures for now and keeping with the original screen resolution. Then see how fast we can get that.

If performance is (too) low, halving the vertical screen resolution (and maybe the texture resultion) is certainly an option. Other tricks are possible as well. But those are compromises to be made later on I think.

Share this post


Link to post
Share on other sites
12 hours ago, Jeffrey said:

 

Right now, I am still handwriting all performance critical functions into assembly. Got it mostly ported, but still some leftovers in c (which still drain the speed). I am struggling a bit with the last pieces, but I think I will get a working version (in assembly) pretty soon. I've had many weird bugs and investigating them is quite a challenge and time consuming...

 

There's something else that might speed things up for you on the programming side: you can write assembly language code that writes assembly language code. Lots of what you're writing is just various LDA VERA_DAT_1 and STA VERA_DAT_0 in various combinations. You could pick some vertical scale to display, say 152 pixels, and have an assembly language routine generate all of those height scale subroutines for you automatically. 

Share this post


Link to post
Share on other sites
7 minutes ago, Ed Minchau said:

There's something else that might speed things up for you on the programming side: you can write assembly language code that writes assembly language code. Lots of what you're writing is just various LDA VERA_DAT_1 and STA VERA_DAT_0 in various combinations. You could pick some vertical scale to display, say 152 pixels, and have an assembly language routine generate all of those height scale subroutines for you automatically. 

Yes. I do that now. I generate 512 routines for 512 possible heights. See my previous posts about it.

  • Like 1

Share this post


Link to post
Share on other sites
18 hours ago, Jeffrey said:

Yes. I do that now. I generate 512 routines for 512 possible heights. See my previous posts about it.

OK, due to VERA's autoincrement you're actually going to have faster software than the original Wolfenstein. 8 cycles per pixel instead of 15.  This version might be better than the original.

Share this post


Link to post
Share on other sites
2 hours ago, Ed Minchau said:

OK, due to VERA's autoincrement you're actually going to have faster software than the original Wolfenstein. 8 cycles per pixel instead of 15.  This version might be better than the original.

I doubt it. The 286 was 16 bit and apparently they capable of writing 2 pixels at the time to VGA. They reached 60 fps i believe.

We could maybe reach 15 fps (when I do some hand calculations). And thats hard.

Share this post


Link to post
Share on other sites
34 minutes ago, Jeffrey said:

We could maybe reach 15 fps (when I do some hand calculations). And thats hard.

With appropriate graphics even 10 fps can be pretty pleasant to look at. So I think It's worth a shot.

  • Like 2

Share this post


Link to post
Share on other sites
3 hours ago, Jeffrey said:

I doubt it. The 286 was 16 bit and apparently they capable of writing 2 pixels at the time to VGA. They reached 60 fps i believe.

We could maybe reach 15 fps (when I do some hand calculations). And thats hard.

If you do exactly like wolfenstein you'll have the same limitations with an 8bit machine, yeah. But there's shortcuts that they didn't try until Doom like binary space partition that could help. 

Share this post


Link to post
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.


×
×
  • Create New...

Important Information

Please review our Terms of Use