Jump to content
Jeffrey

New demo uploaded: Wolfenstein 3D - raycasting demo with textures

Recommended Posts

16 minutes ago, Jeffrey said:

Thanks for the idea. 🙂

I'm pretty sure though that doing floating point math on a 6502 CPU (for this purpose) is slower than doing fixed point processing. The 6502 is not well optimized for doing floating point math.

Floating point numbers are (in general) probably easier to deal with, since handling fixed point 8 or 16 bit numbers means a lot off fiddling to make everything fit and not "break". Floating point numbers have a real advantage when it comes to conveniency. And if the CPU (or GPU) is hardware-optimized for it, its most definitely the better solution.

Maybe a minifloat would work? 🤣

Share this post


Link to post
Share on other sites
21 hours ago, Jeffrey said:

Where did you find the originel sound files? Do hou have and tips to rest about these kinds of FM systems/chips? I sound probably use some help on the sound/music front 😉 

Well, after spending hours reading the sources, I didn't make a hell of a lot of progress because everything's so abstracted - there's a memory manager and a file cacheing engine and it's hard to get down to the nitty gritty of what exactly the header format is and what exactly the music format is. They also seem to have migrated to a music system called MUSE because several routines are #ifdef'd out of the build.

So a little while ago, I decided that the modding community has obviously cracked that nut decades ago, so why not approach from that angle? The music is stored in a format called IMF (Id sure did have a big ego - heh) and from what I've learned thus far, it's essentially a conversion of MIDI but sounds a lot like VGM. I found a file with the actual IMF tracks ripped as individual files, so I think I'm going to spend some time playing around with those and see if I can make something vaguely resembling the music come out of the emu with some boostrapped translation routines. If I get anything cool, I'll post to the forums.

  • Thanks 1

Share this post


Link to post
Share on other sites
7 hours ago, Jeffrey said:

Short update

The latest release (version 1.2.0) ran at 5 fps.

My local version now runs at 7.5 fps 😀😀 Around a 50% gain in speed.

I'm currently optimizing the dda-algorithm. So far I did two things:

Still more speed to be gained 🙂

Just curious, what sort of math are you doing?IIRC Id used some sort of 2 byte fixed point notation. 

Share this post


Link to post
Share on other sites
Posted (edited)
27 minutes ago, Ed Minchau said:

Just curious, what sort of math are you doing?IIRC Id used some sort of 2 byte fixed point notation. 

I essentially use 2 byte fixed notation. I use one byte for the tile position and one byte for the position within a tile.

For multiplication of a 8-bit fraction (like sine and cosine) and a 16-position (tile + position within it) I temporarily extend to 24 bits and keep only the highest 16 bits. All unsigned numbers.

In order to prevent negative numbers as much as possible I normalize every ray direction as if it was within the frist 90 degrees. I currently have 4 maps in memory to accomodate for that. 

From what I understand from the video mentioned earlier Id used 2 bytes for both the tile position and the position within the tile (so double what I use).  But the precision I get seems enough for now. 😉

 

Edited by Jeffrey

Share this post


Link to post
Share on other sites
20 minutes ago, Jeffrey said:

I essentially use 2 byte fixed notation. I use one byte for the tile position and one byte for the position within a tile.

For multiplication of a 8-bit fraction (like sine and cosine) and a 16-position (tile + position within it) I temporarily extend to 24 bits and keep only the highest 16 bits. All unsigned numbers.

In order to prevent negative numbers as much as possible I normalize every ray direction as if it was within the frist 90 degrees. I currently have 4 maps in memory to accomodate for that. 

From what I understand from the video mentioned earlier Id used 2 bytes for both the tile position and the position within the tile (so double what I use).  But the precision I get seems enough for now. 😉

 

Ok, makes sense.  So what does a ray look like in this notation?  I'm looking at this right now:

https://github.com/id-Software/wolf3d/blob/master/WOLFSRC/WL_DR_A.ASM

and wondering how you're representing variables such as xintercept, yintercept, xtilestep, ytilestep, etc.  My 286 assembly is a little rusty but I've seen a few videos about their technique and I think I get the gist of their code.  Depending on how you're representing those variables I might have some speedups.

Share this post


Link to post
Share on other sites
11 minutes ago, Ed Minchau said:

Ok, makes sense.  So what does a ray look like in this notation?  I'm looking at this right now:

https://github.com/id-Software/wolf3d/blob/master/WOLFSRC/WL_DR_A.ASM

and wondering how you're representing variables such as xintercept, yintercept, xtilestep, ytilestep, etc.  My 286 assembly is a little rusty but I've seen a few videos about their technique and I think I get the gist of their code.  Depending on how you're representing those variables I might have some speedups.

The ray direction is simply an index: for every possible angle I have an index. This is used for lookup in sine, cosine, tan and invtan tables.

The ray starting position is a 16 bit position (tile pos + pos in tile).

And i use the same kind of variables like Wolf3D for stepping/intercepting. So that part is pretty similar I think. But my 286 is also very rusty 😉 I used the video to inform me about how Id did it and figured out a way to implement on the 6502/X16/Vera.

Share this post


Link to post
Share on other sites
46 minutes ago, Jeffrey said:

The ray direction is simply an index: for every possible angle I have an index. This is used for lookup in sine, cosine, tan and invtan tables.

The ray starting position is a 16 bit position (tile pos + pos in tile).

And i use the same kind of variables like Wolf3D for stepping/intercepting. So that part is pretty similar I think. But my 286 is also very rusty 😉 I used the video to inform me about how Id did it and figured out a way to implement on the 6502/X16/Vera.

OK.  How many possible directions can the player face?  If it's only 16, you could have 16 sets of lookup tables for a bunch of variables, one set of lookups for each variable, and just look up values rather than looking up sin, cos etc and multiplying.  If you could just look up the xtilestep etc. for each direction and pixel column you'd be able to save a ton of time. 

I mentioned before using Hscale and Vscale to give yourself a 256x128 main display.  That's the same aspect ratio as the 304x152 original - it occurs to me that they were probably using a screen with a 16:9 aspect ratio and used the bottom 1/9 of the screen for the health bar etc.  With a 256 "pixel" wide and 192 "pixel" high screen (Hxcale and Vscale both $33) a two byte variable could be stored on one bank of RAM: 16 tables of 256 (one for each direction and pixel column) bytes for the low byte of the variable, another 16x256 bytes for the high byte of the variable.  So if you had four such variables (perhaps xpartialup, xpartialdown, ypartialup, ypartialdown?  I need to look at the code more), you'd need four banks of RAM to store the lookups and could avoid a bunch of multiplication.

Share this post


Link to post
Share on other sites
Posted (edited)
52 minutes ago, Ed Minchau said:

OK.  How many possible directions can the player face?  If it's only 16, you could have 16 sets of lookup tables for a bunch of variables, one set of lookups for each variable, and just look up values rather than looking up sin, cos etc and multiplying.  If you could just look up the xtilestep etc. for each direction and pixel column you'd be able to save a ton of time. 

I mentioned before using Hscale and Vscale to give yourself a 256x128 main display.  That's the same aspect ratio as the 304x152 original - it occurs to me that they were probably using a screen with a 16:9 aspect ratio and used the bottom 1/9 of the screen for the health bar etc.  With a 256 "pixel" wide and 192 "pixel" high screen (Hxcale and Vscale both $33) a two byte variable could be stored on one bank of RAM: 16 tables of 256 (one for each direction and pixel column) bytes for the low byte of the variable, another 16x256 bytes for the high byte of the variable.  So if you had four such variables (perhaps xpartialup, xpartialdown, ypartialup, ypartialdown?  I need to look at the code more), you'd need four banks of RAM to store the lookups and could avoid a bunch of multiplication.

There are 304 ray directions in a single screen. The FOV is 60 degrees. So there are 6*304 possible ray directions. For tan and invtan i use tables the size of a quarter of that.

I will have to look into their code in more detail to figure out what they do exactly regarding the resolution/angles/aspect ratios etc.

Edit: when looking at their code it seems  they use 3600 possible "fine" directions (for their tan-table lookup). Which is about double what I use.

Edited by Jeffrey

Share this post


Link to post
Share on other sites
9 hours ago, Jeffrey said:

There are 304 ray directions in a single screen. The FOV is 60 degrees. So there are 6*304 possible ray directions. For tan and invtan i use tables the size of a quarter of that.

I will have to look into their code in more detail to figure out what they do exactly regarding the resolution/angles/aspect ratios etc.

Edit: when looking at their code it seems  they use 3600 possible "fine" directions (for their tan-table lookup). Which is about double what I use.

I'm not being clear. Can the player be pointing in any of those 3600 directions? Or is he limited to 16 (ie N, NNE, NE, ENE, E, etc)? Because if it's the latter, you can avoid doing a bunch of multiplications by looking up variables (indexed by player direction and pixel column) rather than calculating them each time. 

Share this post


Link to post
Share on other sites

Another alternative if the above won't work for you: as long as the player doesn't turn, the xstepsize/ystepsize for every column of pixels will stay the same from one frame of image to the next, so you can save these variables in an array and only recalculate them if the player turns.

Share this post


Link to post
Share on other sites
26 minutes ago, Ed Minchau said:

I'm not being clear. Can the player be pointing in any of those 3600 directions? Or is he limited to 16 (ie N, NNE, NE, ENE, E, etc)? Because if it's the latter, you can avoid doing a bunch of multiplications by looking up variables (indexed by player direction and pixel column) rather than calculating them each time. 

Ah right. I think its around 40 directions. But im not sure. I just took Wolf3D and tried to turn around.

It sounds like a lot of data you want to store, but I havent done the math on it. Right now I spend a lot of RAM on the generated texture-draw code. 

Share this post


Link to post
Share on other sites
30 minutes ago, Jeffrey said:

Ah right. I think its around 40 directions. But im not sure. I just took Wolf3D and tried to turn around.

It sounds like a lot of data you want to store, but I havent done the math on it. Right now I spend a lot of RAM on the generated texture-draw code. 

Yeah, if it's more than 16 directions (or at most 32) it wouldn't be practical.  But if it is 16 and you keep it at 304 columns, then that's 19 pages of RAM for each lookup. I was suggesting squashing it down to 256 columns so that lookups for 2 variables could be stored in one RAM bank, or if it's 32 directions then one per RAM bank.

Share this post


Link to post
Share on other sites

Short update:

Now running at 10+ fps 😀

The optimizations I did (on top of the previous ones that got me to 7.5 fps) are:

  • inlining the fast multipliers, so less copying of values, no jsr and rts
  • re-using the somewhat "static" parts of the multiplications, so it won't be re-loaded each time (this was harder than it sounds, quite of bit of refactoring done)
    • Cosine and Sine fractions are player-related, and even though they are negated sometimes, they (that is their squares) could be reused for (almost) each ray
    • The (square of the) fraction of the tile the player is standing in -to be used for calculating the initial x/y interception for each ray- could be reused
  • cleaned up the main loop and several other parts
  • replaced the 16-bit slow divider with a 512-entry table: distance2height (major improvement!!) 🙂

I am quite happy with the speed. The demo plays quite nicely now.

  • Like 5

Share this post


Link to post
Share on other sites
Posted (edited)

@Jeffrey - I've managed to dig up the locations of the AdLib music data from the original game files. Fortunately, it's not compressed in Wolf3d so loading it will be easy enough. I've been making a few notes for myself, and tonight's project is going to be a quick and dirty player routine with my "first best guess" at a naive 'translation' between OPL and OPM and see if anything resembling the actual Wolf3d music comes out of the box. Hopefully, all of my time is not spent reading the register layouts for the OPL w/o any sort of code being written. 🙂

We did catch a very lucky break as far as the game music is concerned - OPL has 9 voices, and Id seems to have always reserved voice 0 for SFX by convention, so that means no more than 8 music voices in the data. Woot!

If it works, then I'll re-write it in assembly and donate the player code to you for your project. The playback rate is strange tho - 700Hz. Not sure how easy that will be to generate amidst the frenzied raycasting. 🙂

Update:
I wasn't able to get far enough to get actual sound out of the program yet, but I did make a lot of progress. I've got the IMF traversal going (which was trivial I must admit) and I've wired a bunch of registers together, but didn't get as far as freq. / keyon registers. Those work pretty differently. Unfortunately, there's just no way the YM is going to sound super-close to the OPL, depending on what features the tunes use because the OPL has several features the OPM just doesn't have - most notably it can use different waveforms than sine waves. There's also the matter of the percussion voice mode, which I think can be kludged using the VERA PSG.

Hopefully I'll have a demo to post tomorrow night.

Edited by ZeroByte
update
  • Like 2
  • Thanks 1

Share this post


Link to post
Share on other sites
7 hours ago, ZeroByte said:

@Jeffrey - I've managed to dig up the locations of the AdLib music data from the original game files. Fortunately, it's not compressed in Wolf3d so loading it will be easy enough. I've been making a few notes for myself, and tonight's project is going to be a quick and dirty player routine with my "first best guess" at a naive 'translation' between OPL and OPM and see if anything resembling the actual Wolf3d music comes out of the box. Hopefully, all of my time is not spent reading the register layouts for the OPL w/o any sort of code being written. 🙂

We did catch a very lucky break as far as the game music is concerned - OPL has 9 voices, and Id seems to have always reserved voice 0 for SFX by convention, so that means no more than 8 music voices in the data. Woot!

If it works, then I'll re-write it in assembly and donate the player code to you for your project. The playback rate is strange tho - 700Hz. Not sure how easy that will be to generate amidst the frenzied raycasting. 🙂

Update:
I wasn't able to get far enough to get actual sound out of the program yet, but I did make a lot of progress. I've got the IMF traversal going (which was trivial I must admit) and I've wired a bunch of registers together, but didn't get as far as freq. / keyon registers. Those work pretty differently. Unfortunately, there's just no way the YM is going to sound super-close to the OPL, depending on what features the tunes use because the OPL has several features the OPM just doesn't have - most notably it can use different waveforms than sine waves. There's also the matter of the percussion voice mode, which I think can be kludged using the VERA PSG.

Hopefully I'll have a demo to post tomorrow night.

Cool!! Thanks in advance! 🙂 It would be so cool to have some music.

Share this post


Link to post
Share on other sites

I need to do a little further digging, but I think I can speed up the math.  John Carmack mentioned in the documentation that there was a better solution staring him in the face.  It's probably staring the rest of us in the face too.

Share this post


Link to post
Share on other sites
Posted (edited)

Ok, I thought of one possible speedup. Right now you're doing each column of pixels in order, casting out a ray and eventually hitting a wall; from that spot you're determining the image to use and the column of pixels within that image and what distance (in the direction the player is facing), then pushing that to VERA. Same as Id did back in the day, basically. 

But what if when you push the column of pixels to VERA, you also kept note of the source column of pixels and the distance, and most importantly the map block and the side of the block seen?  

You wouldn't need to cast the rays in a specific order. Instead you could cast a ray to the far left side, another to the far right side, and one to a column in the middle. Then you can keep dividing the image in half and casting rays.

The advantage here is when two rays hit the same block on the same face. All the rays between those two rays will hit the same visible face as those two, evenly spaced between them. So for any given ray to cast, you look at the existing rays to the left and right and if they are both on the same visible face, then you can very quickly fill in all the rays between them without casting at all, just a linear interpolation. 

You could actually fill in that little table first, and then use it to send all of the columns to VERA in order. I think this will more than double the speed; the total number of rays cast will be much less.

Edited by Ed Minchau
  • Like 1

Share this post


Link to post
Share on other sites
Posted (edited)
42 minutes ago, Ed Minchau said:

Ok, I thought of one possible speedup. Right now you're doing each column of pixels in order, casting out a ray and eventually hitting a wall; from that spot you're determining the image to use and the column of pixels within that image and what distance (in the direction the player is facing), then pushing that to VERA. Same as Id did back in the day, basically. 

But what if when you push the column of pixels to VERA, you also kept note of the source column of pixels and the distance, and most importantly the map block and the side of the block seen?  

You wouldn't need to cast the rays in a specific order. Instead you could cast a ray to the far left side, another to the far right side, and one to a column in the middle. Then you can keep dividing the image in half and casting rays.

The advantage here is when two rays hit the same block on the same face. All the rays between those two rays will hit the same visible face as those two, evenly spaced between them. So for any given ray to cast, you look at the existing rays to the left and right and if they are both on the same visible face, then you can very quickly fill in all the rays between them without casting at all, just a linear interpolation. 

You could actually fill in that little table first, and then use it to send all of the columns to VERA in order. I think this will more than double the speed; the total number of rays cast will be much less.

Thanks! 🙂 I think this a sound idea. Maybe a binary search approach would go well with this.

Not sure how much overhead vs gains would be with this technique. Will have to investigate.  

Edited by Jeffrey

Share this post


Link to post
Share on other sites
Posted (edited)
6 hours ago, Jeffrey said:

Thanks! 🙂 I think this a sound idea. Maybe a binary search approach would go well with this.

Not sure how much overhead vs gains would be with this technique. Will have to investigate.  

Here is what I see for overhead for every column:

1) 2 bytes to point to which column of pixels to check

2) 2 bytes to point to the column to the left which has already been checked

3) 2 bytes to point to the column to the right which has already been checked

4) 1 byte to indicate which map block was intersected by a ray

5) 1 byte (actually just two bits) to indicate which face of the block was hit by the ray

6) 1 byte (actually just 6 bits) to indicate which column of the map block's pixel image is displayed

7) 2 bytes for the distance in the direction the player is pointing (or the value between 0 and 511 corresponding to the bespoke draw routine)

So that's ten or eleven bytes of data that have to be stored for each column.  The first six bytes listed can be generated ahead of time and wouldn't change.

You can set the high byte of the distance to FF for all rays to begin with, and then when you are testing a ray the first thing you do is see if that byte is FF; if not, then you've already linearly interpolated that column and can skip to the next entry on list 1.  If it is FF then you can check the one to the left and the one to the right, and if they have the same group 4 and group 5 you can linearly interpolate the entire range between them.  If they're not the same, then you cast a ray as before.

Column 0 and column 303 would be raycast as before and would be the only two you'd absolutely have to cast.  After that, the third entry on list 1 would point somewhere in the middle and its left pointer would be 0 and its right pointer 303.  Suppose you choose column 256 for the third entry on the group 1 list and column 288 for the fourth entry; after that you've got one group of 256, one group of 32 and one group of 16, so it's all binary splitting from there.  All the linear interpolations would have a power of two in the denominator.  If you wanted to make these interpolations bespoke routines like you did with pushing data to VERA, then there would only be a maximum of 8 of these subroutines. Or you could cast a ray every 16 columns and then try linear interpolations, and have at most 4 linear interpolation subroutines (filling in 1, 3, 7 or 15 entries).

The group 2 value for each column would be the largest value less than the current column number in the group 1 table above the current entry, the group 3 value would be the smallest value greater than the current column number in the group 1 table above the current entry; you can probably generate the first three groups of data with a spreadsheet.  Groups 4 through 7 would be generated on the fly with every image you draw.  If you really need to cram the memory, group 5 and 6 can be combined, both values in a single byte.

At a rough estimate, you'll be able to linearly interpolate 80 to 90 percent of the rays.

Edited by Ed Minchau

Share this post


Link to post
Share on other sites
Posted (edited)

Thinking more about interpolation:

Suppose pixel column 0 and pixel column 8 are pointing at the same map block and the same face; say both pointing at the north side of map block 0xFE.  Suppose further that column 0 is pointing to (group 6) the leftmost column of the image associated with that map block (call it image column 0) and pixel column 8 is pointing to (group 6) an image column about 2/3 of the way to the right, image column 42.   And let's say for pixel column 0 the group 7 data is 511 (it's really far away) and for pixel column 8 the group 7 data is 502 (a little closer). 

So, we've got 7 pixel columns to interpolate, and two ranges: 42-0= 42 for the image column range and 502-511= -9 for the height or distance range.

You've already got a macro or subroutine that multiplies a 16 bit number by an 8 bit number representing the range 0/256 to 255/256.  So for a 7 point interpolation like this, you'd have a table of 7 numbers to multiply these ranges by 1/8, 2/8, 3/8... 7/8.  So you use that macro to multiply the lookup by the range and add the value stored in the leftmost pixel column's group 6/7 values. 

So pixel column 1's image column would be pixel column 0's image column (0) plus 1/8 times the range (42) = 5 and the height would be 511 + (1/8)(-9) = 510.  All the others between pixel column 2 and pixel column 7 would be just that easy, just increment the interpolation pointer as you increment the pixel column pointer.

A 15 pixel range would need a table of 15 numbers, a 31 pixel range would need a table of 31 numbers... you could have these multiplication lookup tables for interpolating 1, 3, 7, 15, 31, 63, and 127 pixel columns all in one 256 byte page of RAM.

So, total overhead: 10 or 11 bytes for each pixel column; in practical terms you'd have 512 bytes set aside for each variable even though there's only 304 columns.  The remaining 192 bytes can remain empty or be used for other purposes such as the linear interpolation tables.  So that's 20-22 pages of RAM, 5 to 5.5 kb, depending on whether or not you keep groups 5 and 6 separate.  You could use slightly less memory by cramming it together more, at the expense of speed.

Edited by Ed Minchau

Share this post


Link to post
Share on other sites
Posted (edited)

It just occurred to me that you wouldn't need a fraction lookup table at all to interpolate 255 pixel columns; the fraction to multiply by would be the column number itself.  Also, instead of raycasting column 303 it might work better to cast the imaginary column 304 to make the binary split work better.

Edited by Ed Minchau
  • Like 1

Share this post


Link to post
Share on other sites
Posted (edited)

This algorithm sounds pretty good. I've been doing a little thinking about it myself, and perhaps a recursive algorithm is in order. Essentially you call Render(MinX, MaxX, MyRay) and it works as follows:

  • Interpolate the leftmost column (TestCol) of the face the ray hit, clamp TestCol to MinX and cast a ray (testRay) at TestCol
    • If TestCol hits the same face,
      • render from that column to MyRayX.
      • new range is MinX .. TestCol
      • Cast a new TestRay at the center of the new range
      • call Render with the new range and testRay
    • Else call Render with MinX=MinX, MaxX = MyRay.X, TestRay
      • Render returns the rightmost column it rendered - set MinX = returned column+1
      • (it is assumed that everything from old MinX to MyRay.X has now been completely rendered.
  • Do the same thing going rightwards from MyRay.X
    • i.e. if the right end of the face is visible, render from MyRay to that column, and recursive call to render on the remaining rightmost space, else render the rightmost space, then render from MyRay.X to the leftmost column not rendered.

I think the hidden challenge in this method is going to be in providing stack-like behavior for the MinX/MaxX, MyRay values so they don't get lost during the call to yourself. Another optimization would probably be to choose some minimum number of columns where it's faster to just brute force that left to right, and just do it if MinX/MinY are smaller than this threshold.

This algorithm should be very fast when walls are close to the camera, and might be slower when there's a lot of clutter in the wall topology.

Edited by ZeroByte
Realized my algorithm would actually render all the way up to the center column, not just up to the interpolated leftmost column.

Share this post


Link to post
Share on other sites
8 minutes ago, desertfish said:

won't this break when you add non-wall elements such as monsters, projectiles

In the original game the bad guys/ projectiles were all sprites.  That's going to take some trickery to make it work here because we can't resize sprites.

Share this post


Link to post
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.


×
×
  • Create New...

Important Information

Please review our Terms of Use