Jump to content
  • 0

Equinoxe game engine progress


svenvandevelde
 Share

Question

I've done a lot of optimizations lately on the equinoxe game engine that i'm building for the CX16, using the kickc compiler... thought of sharing this.
The below video shows 30 sprites floating in a controlled path using a rudimentary state machine with vector rotation and movement logic simultaneously over a 640x480 screen.
On top, the sprites rotate, there is also the movement of the player sprite and there is a spacial grid updated using a hash table and linked list to further build a logic for collision detection.

The side bars show the CPU used per frame. And that where the challenge is ... With 480 rasters per frame to be drawn, at 8 Mhz CPU each raster line covers about 277 cycles.
That is not much. So with 30 sprites having simultaneously a vectorized calculation using 3 byte fixed point math (2 bytes integer and 1 byte decimal point), spacial grid fill of the bounding boxes (some sprites in the grid can cover up to 4 cells in the grid depending on the position), and then the movement using the VERA port based api (it's cumbersome), then this is a progress I would say.

 

 

My biggest challenge is now how to further optimize the assembler. The indirect indexed mode addressing is taking 5 to 6 cycles for each x and y calculation for each sprite. And other fields in the control record for each sprite also can only be addressed using indirect indexed addressing. Otherwise i would need to copy data over from area to area to code in absolute (indexed) addressing, and that also takes time.

I'm sure you recognize what I mean:

image.png.eac10221d77a672041d96063642efe2d.png

 

Any ideas?  

Link to comment
Share on other sites

9 answers to this question

Recommended Posts

  • 0
Posted (edited)
On 3/17/2022 at 3:55 PM, SlithyMatt said:

Are the addresses stored at "fp3" and "add" changing much? Since your Y index is always a known value, you may be better off just using absolute addressing.

So I cannot use absolute addressing since the fp3 and add are changing a lot. Basically there is an iteration of all sprites (which are stored in a linked list in BRAM).

And per sprite, there is "Enemy" sprite pointer expressed by enemy, that contains various fields with steering data and properties ...

image.png.311298e43ab00660f8d93969af8d2ad3.png

This is translated in C code to a zeropage pointer...

image.png.82b9ab46382f2f1cf3b7c13cf191579f.png

which is used like this: (the statement enemy-size == SIDE_ENEMY)

image.png.c9127c857c727d308e0159597a0f608a.png

 

I have thought of copying the whole data block under enemy_handle_ptr to a fixed memory location and then, as you say apply absolute memory addressing, 
but that slows down the process too much, since i need to first read the data, then update it, and then write the data back. And this for 30 sprites ...

Edited by svenvandevelde
Link to comment
Share on other sites

  • 0
On 3/17/2022 at 3:55 PM, SlithyMatt said:

Are the addresses stored at "fp3" and "add" changing much? Since your Y index is always a known value, you may be better off just using absolute addressing.

You got me thinking ...

Instead of using "pointers" to enemies in bram, i could maybe make a fixed memory engine using a "pool" of sprite control blocks in low ram.
And as the game evolves, I can copy in the sprite control data to the "free" pool of sprite control blocks.

In this way, i can avoid this indirect addressing, and I can copy in the sprite control blocks gradually during game progress.
Only when specific enemies become active from the linked list in bram.

It will require copying data, but if i can avoid copying a lot of bytes, then that might be the solution!!!

In reality, there won''t be more than 64 sprites alive at the same time.

Sven

Link to comment
Share on other sites

  • 0

Another thing to consider is just unrolling the options and have a jump table to routines that are the same but operate on different blocks of RAM. If it's only an order of 64 and the routine is short, the footprint for this won't be too expensive.

  • Like 1
Link to comment
Share on other sites

  • 0
Posted (edited)

Further progress ... So I've rewritten the enemy sprite controlling code to use absolute indexed addressing now.
It not only reduces the number of cycles for the actual addressing, but also the overhead. No more zeropage registers to be used and initialized before indexing.
Everything is embedded in the code.

Your idea was a good one @SlithyMatt. Also @Jesper Gravgaard it is truly amazing how your optimizing compilers plays with the registers to optimize the code for speed and code footprint.
I really recommend anyone to try out kickc. You can contact me if you need end-user help or assistance. Happy to share my experience on it.

 

Edited by svenvandevelde
  • Like 2
Link to comment
Share on other sites

  • 0

So now, i have the first version of collision detection working. 32 sprites in the air, a handful of bullets. and find the real collisions!

Note that the brown area in the border is indicating the CPU time to detect the collisions and eliminate them.

The actual detection is done through a spacial grid using a hashing algorithm.

The brown colored CPU is way too much, i still need to find ways to optimize the logic, and there is still room for improvement.

But it is working now ...

 

  • Like 4
Link to comment
Share on other sites

  • 0
Posted (edited)

Now it is finally coming together. I have now optimized the spacial grid distribution. One of the key advantages of the CX16 is that is has memory.
Loads of it. Sufficient for an 8-bit machine to do stuff that using pure computation would bring the 6502 flat. They key is lookup tables or calculation tables in memory.
Now with 64 banks of additional ram, this flight model now uses hash tables placed at certain banks that allow for fast spacial grid distribution of the objects.
It calculates a key based on the x/y coordinate of each object, modulus 64. In other words, there are 640%64 and 480%64 grid cells in the hashed tabled.
That makes a total of 80 cells, 10 columns horizontally and 8 rows vertically. Of course, there can be more objects than there are grid cells, so collisions happen.
However, the hashing logic puts all collisions in a linked list. The logic however avoids now pointers totally, all is captured in tables. Even the linked list is nothing more than 
a structure that has a data and a next index, all elements stored in a pre-defined table of 128 of these elements. So in other words, in one bank, the complete hash table can be stored.
The distribution of the hashed elements is faciliated again using a lookup table of 256 bytes, which is a list of random numbers from 0 to 255, where each element is one number
of the series. So using this lookup table, a random placement of the hashed keys can be achieved. A good distribution function is required to ensure performance of the hashing table.
An other thing that has been done, is to capture in the hashing table the sprites according to a certain grouping, like player, enemy, player bullets, enemy bullets, scenery etc.
The grouping is part of the hashing key, so using the grouping + x/y coordinates, each cell in the spacial grid can be quickly retrieved.
This allows to retireve all the enemy sprites in a grid cell, and all bullets in a grid cell and then compare them using an algorithm ...

Have a look below to see the effect:

I will write all this down with coding examples so that this logic can be shared in the group for usage later.
But here is a small taste:
 

image.png.70f7978324575e8f872b4b7c74088f68.png

Edited by svenvandevelde
  • Like 6
Link to comment
Share on other sites

  • 0
Posted (edited)

Equinoxe Flightmodel 5 ... Now the ground scrolling using the random tile engine is back alive and added to the mix... Still need to optimize the tile engine as it still is working with zeropage level pointers causing indirect addressing mode to be applied in the generate assembly, causing relatively slow performance. There is still a remaining glitch of those black areas that i need to sort out, and i feel that there is memory overwritten as a specific place that i also need to check further. (not in banked memory by the way, but in the main ram).

The below video shows the preliminary result, and it shows also how the tile engine adds to the drawing overhead per scan line. It is already doable, but not good enough.
I need to smoothen the large copy action of tiles (drawing each tile row plus at row 31, copying row 32 to row 0 to allow a seamless vertical scroll effect) to be smoothened frame per frame instead of doing all in one frame :))

Anyway, the scrolling now works again and the engine is in progress. Note the debug echo in the terminal emulator with informational text when loading ...

 

 

Edited by svenvandevelde
  • Like 1
Link to comment
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Answer this question...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

 Share

×
×
  • Create New...

Important Information

Please review our Terms of Use