Jump to content

VERA and number 48 ...


Recommended Posts

On 10/3/2022 at 2:37 PM, Wavicle said:

I will think on this. I think calculating row address with 48 pixels isn't too bad. E.g.:

row = y_counter - sprite_y_start;
if (mode) begin
    // 8bpp
    line_addr = sprite_offset + (row << 6) - (row << 4); // row * 64 - row * 16
end
else begin
    // 4bpp
    line_addr = sprite_offset + (row << 5) - (row << 3); // row * 32 - row * 8
end

Something along those lines should work. I probably need to wake up a bit more and reality check this against the Verilog. Another concern is breaking any existing software that uses 64x64 sprites.

Did you have a chance to look more into this afrer waking up? If the calculations aren’t too complicated, I think 48 pixels sprites is a good suggestion. Breaking existing software is not a problem as I see it. We all know we write software for a prototype.

  • Thanks 1
Link to comment
Share on other sites

On 10/3/2022 at 1:30 PM, svenvandevelde said:

One more note, since the cx16 doesn't have dma, every memory movement is down to the processor. Moving memory is fast, but moving 4096 bytes every frame is not an option (so i cannot use 64x64 in 8bpp) for animations, bottom line.

Note that the video shows animations of two 64x64 sprites floating.

So you're not animating by having all frames in VRAM and updating indexes, but by giving each object a VRAM allocation and each one updates its pixel data directly? That definitely would consume lots of CPU at scale w/o a DMA chip. That's how Sega did Sonic's animation, but the Genesis has DMA....

Link to comment
Share on other sites

On 10/3/2022 at 8:50 PM, Johan Kårlin said:

Did you have a chance to look more into this afrer waking up? If the calculations aren’t too complicated, I think 48 pixels sprites is a good suggestion. Breaking existing software is not a problem as I see it. We all know we write software for a prototype.

 

Thank you.

Link to comment
Share on other sites

Posted (edited)
On 10/3/2022 at 8:52 PM, ZeroByte said:

So you're not animating by having all frames in VRAM and updating indexes, but by giving each object a VRAM allocation and each one updates its pixel data directly? That definitely would consume lots of CPU at scale w/o a DMA chip. That's how Sega did Sonic's animation, but the Genesis has DMA....

I have 2 things implemented in this algorithm ... A "least recently used cache or LRU cache", that monitors which was the vera image that was least recently used. The index (handle) pointing to the images are dynamically allocated in VRAM through a heap manager. So, when the drawing engine is trying to draw an image of a sprite, it checks if this sprite image is already in the lru cache.

If it is in the lru-cache, it will just re-use the image already in VRAM.

If it is not in the lru cache, it will loop ... until the required image can be put into vram. How?
It loops, freeing the last image in the lru cache (so the least recently used image) from vram. It deletes this entry from the lru cache and then frees the image from vram.
Then it tries to best-fit the new image in vram (it checks if there is space for it). If that best-fit search fails, (due to the least recently used image freed space made available being too small), it retries freeing the least recently used image from the lru cache and freeing vram of that image.
This until the image could be successfully best fitted in vram by the heap manager, and then the image is copied into the vram dynamically, and added to the lru cache as the most recently used image.

Images are copied from BRAM into VRAM using indeed, some sort of a copy funciton, which i worked on very hard to get it optimal. I still need to work on this copy module.

See below the lru cache core utilization logic for managing the images.

vera_sprite_image_offset sprite_image_cache_vram(fe_sprite_index_t fe_sprite_index, unsigned char fe_sprite_image_index) {
    // check if the image in vram is in use where the fe_sprite_vram_image_index is pointing to.
    // if this vram_image_used is false, that means that the image in vram is not in use anymore (not displayed or destroyed).
 
    unsigned int image_index = sprite_cache.offset[fe_sprite_index] + fe_sprite_image_index;
 
    // We retrieve the image from BRAM from the sprite_control bank.
    // TODO: what if there are more sprite control data than that can fit into one CX16 bank?
    bank_push_set_bram(fe.bram_sprite_control);
    heap_bram_fb_handle_t handle_bram = sprite_bram_handles[image_index];
    bank_pull_bram();
 
    // We declare temporary variables for the vram memory handles.
    lru_cache_data_t vram_handle;
    vram_bank_t vram_bank;
    vram_offset_t vram_offset;
 
    // We check if there is a cache hit?
    lru_cache_index_t vram_index = lru_cache_index(&sprite_cache_vram, image_index);
    lru_cache_data_t lru_cache_data;
    vera_sprite_image_offset sprite_offset;
    if (vram_index != 0xFF) {
 
        // So we have a cache hit, so we can re-use the same image from the cache and we win time!
        vram_handle = lru_cache_get(&sprite_cache_vram, vram_index);

         vram_bank = vera_heap_data_get_bank(VERA_HEAP_SEGMENT_SPRITES, vram_handle);

        vram_offset = vera_heap_data_get_offset(VERA_HEAP_SEGMENT_SPRITES, vram_handle);
 
        sprite_offset = vera_sprite_get_image_offset(vram_bank, vram_offset);
    } else {
 
        // The idea of this section is to free up lru_cache and/or vram memory until there is sufficient space available.
        // The size requested contains the required size to be allocated on vram.
        vera_heap_size_int_t vram_size_required = sprite_cache.size[fe_sprite_index];
 
        // We check if the vram heap has sufficient memory available for the size requested.
        // We also check if the lru cache has sufficient elements left to contain the new sprite image.
        bool vram_has_free = vera_heap_has_free(VERA_HEAP_SEGMENT_SPRITES, vram_size_required);
        bool lru_cache_not_free = lru_cache_max(&sprite_cache_vram);
 
        // Free up the lru_cache and vram memory until the requested size is available!
        // This ensures that vram has sufficient place to allocate the new sprite image.
        while(lru_cache_not_free || !vram_has_free) {
 
            // If the cache is at it's maximum, before we can add a new element, we must remove the least used image.
            // We search for the least used image in vram.
            lru_cache_key_t vram_last = lru_cache_last(&sprite_cache_vram);
 
            // We delete the least used image from the vram cache, and this function returns the stored vram handle obtained by the vram heap manager.
            vram_handle = lru_cache_delete(&sprite_cache_vram, vram_last);
            if(vram_handle==0xFFFF) {
                gotoxy(0,59);
                printf("error! vram_handle is nothing!");
            }
 
            // And we free the vram heap with the vram handle that we received.
            // But before we can free the heap, we must first convert back from teh sprite offset to the vram address.
            // And then to a valid vram handle :-).
            vera_heap_free(VERA_HEAP_SEGMENT_SPRITES, vram_handle);
            vram_has_free = vera_heap_has_free(VERA_HEAP_SEGMENT_SPRITES, vram_size_required);
        }
 
        // Now that we are sure that there is sufficient space in vram and on the cache, we allocate a new element.
        // Dynamic allocation of sprites in vera vram.
        vram_handle = vera_heap_alloc(VERA_HEAP_SEGMENT_SPRITES, (unsigned long)sprite_cache.size[fe_sprite_index]);
        vram_bank = vera_heap_data_get_bank(VERA_HEAP_SEGMENT_SPRITES, vram_handle);
        vram_offset = vera_heap_data_get_offset(VERA_HEAP_SEGMENT_SPRITES, vram_handle);
 
        memcpy_vram_bram(vram_bank, vram_offset, heap_bram_fb_bank_get(handle_bram), (bram_ptr_t)heap_bram_fb_ptr_get(handle_bram), sprite_cache.size[fe_sprite_index]);
 
        sprite_offset = vera_sprite_get_image_offset(vram_bank, vram_offset);
        lru_cache_insert(&sprite_cache_vram, image_index, vram_handle);
    }
 
    // We return the image offset in vram of the sprite to be drawn.
    // This offset is used by the vera image set offset function to directly change the image displayed of the sprite!
    return sprite_offset;
}

   

Edited by svenvandevelde
Link to comment
Share on other sites

My $0.02's worth - I understand about the intermediate 48px size, but I really can't honestly say one of the other sizes that should be discarded in favor of 48px.
8 and 16 - no way. Those are tile-sized aspects and there are far too many uses for such, and furthermore, there are far too many "little" things that would become bloated in VRAM if 8x8 were dropped as a sprite size, especially. I've already done many programs where I used free slots in the tilemap as sprite slots or else used letters/numbers from the tilemap as digits, e.g. the score counter in Flappy Bird....
64 - useful for bosses / making panels for things like radar / HUD overlays / popup dialogs, etc. (for instance, in games like StarFox where a character's portrait pops up along with their dialog/voice clip, having a 64x64 sprite for this is perfect)

32 is probably the only one I'd say is "up for discussion" as an atomic size that could be considered expendable in favor of 48. Both are "middle" sizes, and maybe 40 or 48 offer more utility than 32, but the extremes are both quite valuable and dare I say, indispensable?

 

  • Like 1
Link to comment
Share on other sites

On 10/3/2022 at 9:25 PM, ZeroByte said:

My $0.02's worth - I understand about the intermediate 48px size, but I really can't honestly say one of the other sizes that should be discarded in favor of 48px.
8 and 16 - no way. Those are tile-sized aspects and there are far too many uses for such, and furthermore, there are far too many "little" things that would become bloated in VRAM if 8x8 were dropped as a sprite size, especially. I've already done many programs where I used free slots in the tilemap as sprite slots or else used letters/numbers from the tilemap as digits, e.g. the score counter in Flappy Bird....
64 - useful for bosses / making panels for things like radar / HUD overlays / popup dialogs, etc. (for instance, in games like StarFox where a character's portrait pops up along with their dialog/voice clip, having a 64x64 sprite for this is perfect)

32 is probably the only one I'd say is "up for discussion" as an atomic size that could be considered expendable in favor of 48. Both are "middle" sizes, and maybe 40 or 48 offer more utility than 32, but the extremes are both quite valuable and dare I say, indispensable?

 

Very cool feedback, yes, probably you are right and the 8px is required as a sprite map. It would be great if the vera would allow to configure the sprite map sizes somehow with a bit map flag, just as @AndyMt already indicated. Your suggstion to blow up the 32px is a good one. Or to reduce the 64px. I think a "dynamic solution" would be the best maybe.

Link to comment
Share on other sites

That dynamic heap manager makes things easier from the programmer's perspective,  but it adds a lot of computational overhead at run time. It might be that one of the tradeoffs of having the dynamic heap manager is slightly more complicated sprite management code. And really it's all about tradeoffs.

The current sprite dimensions make a lot of sense if speed is the goal, and 48 or 40 make sense from an ease-of-programming goal. Hard call.

Breaking a 48x48 into 2 or 4 or 9 sprites is probably the best option here. Combining smaller sprites into a single image actually sounds like something the heap manager should be good at.

Addendum: maybe dividing up into several heaps, one of 16x16, one of 32x32, one of 8x16 etc. Each heap just one size of sprite, perhaps all of them controlled by a master heap manager? It's just one more layer of abstraction. 

I definitely want to keep the 8x8. The stars in Asteroid Commander are 8x8 4bpp sprites, and really only one pixel in the center has color. Anything more than 32 bytes and I'd have to cut the number of stars in half. Getting rid of 8x8 also means getting rid of 8x16 and 16x8 and 32x8 (which I also use) and 8x32 and 64x8 and 8x64.

Edited by Ed Minchau
  • Like 1
Link to comment
Share on other sites

Posted (edited)

Thanks Ed. I read your mail with great interest. 

On 10/4/2022 at 12:34 AM, Ed Minchau said:

Breaking a 48x48 into 2 or 4 or 9 sprites is probably the best option here.

Probably yes, but not convinced. Allow me to reflect my thinking about this idea a bit further, let's talk about complexity elements, like heap manager, cpu overhead, code size and data size...

Complexity factors to be taken into account:

  • Copying the image data (in my case 1152 bytes) onto those asymetrically distributed sprites.... For a 48x48 I would paint a 32x32; 32x16; 16x32; 16x16. This combination gives the most memory efficiency.
  • Selecting which sprite offsets to use... It matters as sprites with offsets lower in memory are painted above sprites with offsets higher in memory. So when sprites overlap each other, in my case enemies, you want to have enemies showing a consistent overlap and not one part overlapping while an other part appears behind the sprites. So the sprite offsets to be selected carefully. Maybe the offsets even have to be sorted, which also takes CPU time.
  • Moving the sprites.... moving gets more overhead on the CPU. Imagine 16 enemies on the screen moving. Each enemy having 4 sprites. Moving for my case, means 2 things: x and y axis plus animation image updates. So that would mean 4 (sprites) times 4 (x/y) times 2 (image offset addresses) times 16 bytes to be updated every frame. It's not that the cx16 cannot pull this at 8Mhz, it can do this easily, but it adds CPU time per frame.

These things can be done but it gives extra CPU overhead and extra code to manage all this.

The heap manager in combination with the lru cache can help greatly with dynamically allocating those sprite image parts. However:

  • Each image part would have a handle pointing to the place on the heap, which again consumes memory. Instead of one handle the logic now needs to reserve more handles pointing to the images per sprite.
  • I would need to put an extra layer to manage the lru cache. Since sprites are composed out of multiple images, the elements in the lru cache should point to an array of image offset handles, and not to the handles itself ...
  • Allocating sprites on the heap and memory checking becomes more labour intensive. A sprite with 16 animations in a 48x48 setup results in 64 images allocated on the heap, thus 64 heap manager index entries.
  • For speed reasons the vera heap manager indexes are only one byte long. A maximum of 255 images can be stored in the heap manager because NULL value is also required. That would mean a maximum of 4 enemies (having 16 animation frames). No ... 3 enemies as the heap also manages player, bullets, particles, weapons, ground installations. Maybe even 2 enemies maximum possible. Or the heap manager should have 16 bit indexes increasing memory and decreasing speed. So it looks like the heap manager index size would become an issue.

 

On 10/4/2022 at 12:34 AM, Ed Minchau said:

That dynamic heap manager makes things easier from the programmer's perspective,  but it adds a lot of computational overhead at run time.

In fact, not really. In the current setup, the heap manager really helps and it is very efficient. I would say a very little CPU overhead, some code overhead and some data overhead. 

Regarding overhead:

  • The vera heap manager from a code perspective is about 0x600 bytes for the core heap functions. However, the heap needs data which adds another 0x0800 bytes. The indexes are managed in banked ram.
  • The lru cache is also an object that requires code and memory. It consumes about 0x0400 bytes of code, and the lru heap is located in main memory in address 0x0400. It consumes only 0x01FF bytes.

sv.

Edited by svenvandevelde
Link to comment
Share on other sites

On 10/4/2022 at 12:28 AM, svenvandevelde said:

May it also be noted, that the sprite registers have a lot of unused areas, like register 3 and 5.

There are free bits which potentially could be used by vera for further settings. @Wavicle.

image.png.c9b137a5be18fc05581d31bf9894f80e.png

The existence of space in the register mapping doesn't automatically mean there are the on-chip resources to populate those resources ... but on the other hand the design seems to be I/O pin constrained, so here's hoping.

A similar saving to the 48x48 square sprites could be achieved with options of 32x64 and 64x32 rectangular sprites. 48x48 is 2,304 pixels, and 32x64 is 2,048 pixels.

Edited by BruceMcF
  • Like 1
Link to comment
Share on other sites

On 10/4/2022 at 1:56 PM, BruceMcF said:

A similar saving to the 48x48 square sprites could be achieved with options of 32x64 and 64x32 rectangular sprites. 48x48 is 2,304 pixels, and 32x64 is 2,048 pixels.

Very true however, then the figures are either very flat or very wide, unless i'm misunderstanding thefeedback. But yes, this has crossed my mind and will use these in the game.

For lasers or for structures like walls with lasers etc.

 

Link to comment
Share on other sites

On 10/4/2022 at 10:26 AM, svenvandevelde said:

Very true however, then the figures are either very flat or very wide, unless i'm misunderstanding thefeedback. But yes, this has crossed my mind and will use these in the game.

For lasers or for structures like walls with lasers etc.

 

This is, if neighboring powers of two as vertical and horizontal dimensions are easier to add to the circuitry than a common dimension that is the sum of two powers of two.

If the use case is, "I would use a 64x64 here, but given the empty space that is wasting too much VRAM per sprite", the user side question is whether the empty space is going to be evenly spread vertically and horizontally, or whether it is going to be biased either to taller objects with excessive horizontal empty space and wider objects with excessive vertical empty space.

Link to comment
Share on other sites

I agree with @ZeroByte if we could add a flag to upscale a sprite the sprite best suited to be upscaled is the 32 bit one.

I have a thought : looking at the sprite registers: is anyone using the palette offset at its full potentiality? maybe we could have 3 bits for sprite width and height such as

8 12 16 24 32 48 64 (7 in total) and still have 2 bits of palette offset to display the same sprite in  4 different hues.

 

 

 

Link to comment
Share on other sites

VERA scaling is done for the whole display, not just for a layer or sprite. The current sprite dimensions go up to 64x64, so if you want something bigger, use more sprites. You have 128! And if you want 16x48, you can make the sprite 16x64 and have empty space on the vertical margins. Sprite asset addresses must be aligned to 32-byte blocks, but you can use some of this margin area to have overlapping sprites. A 4bpp 16x48 sprite means you have a total margin of 256 bytes within a 16x64 bitmap. Split that in half, and you can have 128 byte margins between each sprite, which means 8 rows of transparent pixels at 4bpp. The alignment still works because 128 % 32 == 0. So you're only "wasting" half of the VRAM that you're fearing, and it's only 128 bytes per sprite for 4bpp.

  • Like 4
Link to comment
Share on other sites

On 10/5/2022 at 6:58 AM, SlithyMatt said:

Sprite asset addresses must be aligned to 32-byte blocks, but you can use some of this margin area to have overlapping sprites.

Excellent point, I haven't thought of that! This means we have a virtual 64x48 dimension so for 8bpp it's 3072 bytes instead of 2304. So 768 bytes would be "wasted" compared to a 48x48 dimension. I think that's a good compromise.

Edited by AndyMt
Link to comment
Share on other sites

On 10/5/2022 at 12:58 AM, SlithyMatt said:

... And if you want 16x48, you can make the sprite 16x64 and have empty space on the vertical margins. ...

Also, if VRAM is very tight, if you can design a 48x48 asset so that the top left hand and right hand 8 wide by 16 high is empty, it can be composed of one 32x32, one 16x32 to complete the base, and one 32x16 to fill out the top. Indeed, part of the visual story telling here might be that most of your antagonist characters are designed to fit into 16x16 sprites, and you have faced bosses that fit into 32x32 sprites, but now you are facing a "super boss", who visually fills even more of the screen ... so the top 48x16 space could well be a 16x16 head of a character sprite, saving even more space. Or you've faced a series of 16x16 fighter ships, and some 32x16 space frigates, and now you are facing the space destroyer ... where, again, it doesn't have to occupy all the extremes of a box to be visually bigger, and you have the  top and bottom of the "drive tail" as 16x16 sprites above and below the "back" edge of a 64x16 rectangular sprite. It's the size to fit into a single 64x48 asset, but since it's long and sleek with a tall tail, there's two 48x16 rectangles that are empty space and simply not required to be defined as part of the set of sprites that make up the asset.

Edited by BruceMcF
Link to comment
Share on other sites

On 10/4/2022 at 4:34 PM, Fabio said:

I agree with @ZeroByte if we could add a flag to upscale a sprite the sprite best suited to be upscaled is the 32 bit one.

I have a thought : looking at the sprite registers: is anyone using the palette offset at its full potentiality? maybe we could have 3 bits for sprite width and height such as

8 12 16 24 32 48 64 (7 in total) and still have 2 bits of palette offset to display the same sprite in  4 different hues.

Sprites only come in 4bpp and 8bpp color depths. Palette offset is essentially meaningless in 8bpp mode (although it does affect the colors, and I'm sure there's some clever schema in arranging the palette and sprite assets to get some kind of effect from it, but that's going into mad science / demoscene territory). The palette offset value is designed such that you should think of the master palette as being divided into 16 different 16-color palettes. Palette offset selects which of these 16 palettes a 4bpp asset uses. I.e. palette_offset = row, pixel value = column.

 

Link to comment
Share on other sites

On 10/5/2022 at 1:48 PM, ZeroByte said:

Sprites only come in 4bpp and 8bpp color depths. Palette offset is essentially meaningless in 8bpp mode (although it does affect the colors, and I'm sure there's some clever schema in arranging the palette and sprite assets to get some kind of effect from it, but that's going into mad science / demoscene territory). The palette offset value is designed such that you should think of the master palette as being divided into 16 different 16-color palettes. Palette offset selects which of these 16 palettes a 4bpp asset uses. I.e. palette_offset = row, pixel value = column.

One really cool palette offset effects at 4bpp would be some colors are repeated in the base palette and then they progressive change as a sprite "takes on damage". Another would be cycling through a sequence of palettes to "reflect the light" at greater and less intensity when there is an explosion toward the front side of a sprite.

Link to comment
Share on other sites

Then a possible solution is to draw an enemy in the upper part of the 64*64 sprite and another in the lower part: of another 64*64 sprite and to overlap the tail of sprite 1 on the head of sprite 2.

Still we could consider this solution.

orizzontal width 8 16 32 48

vertical height 8 16 32 64

with this arrangement if 48*48 is needed I can overlap the sprites while if 64*64 is needed i can align two 32*64 sprites.

does it sound like a good compromise?

Edited by Fabio
grammar
Link to comment
Share on other sites

Hi Everyone, sorry I haven't responded in a bit. I've been letting this gel in my brain a bit.

If this were to become part of VERA, the extra sprite height/width bits would need to go in byte 5 of the sprite attribute structure. Making the VERA change to have this feature doesn't look like it will be difficult. For this to have a chance of getting seriously considered however, I need some sort of demo program that sets the new bits and renders 24- and 48-pixel wide/tall sprites in 4 and 8 bpp color. For practical reasons, the best place for this speculative feature development to happen is on Discord. I don't mind discussing the results here, but the development cycle when prototyping new candidate features needs a collaboration environment with lower latency. Ping me over on the Hardware / Video channel and we can discuss what a demo/proof of concept for this should look like.

(Standard disclaimer: I haven't promised anybody a pony; I'm not saying this definitely will be a thing, just that we will have to establish the risk is low and it doesn't burn too many of our remaining gates.)

  • Like 3
Link to comment
Share on other sites

On 10/6/2022 at 2:55 PM, Wavicle said:

Hi Everyone, sorry I haven't responded in a bit. I've been letting this gel in my brain a bit.

If this were to become part of VERA, the extra sprite height/width bits would need to go in byte 5 of the sprite attribute structure. Making the VERA change to have this feature doesn't look like it will be difficult. For this to have a chance of getting seriously considered however, I need some sort of demo program that sets the new bits and renders 24- and 48-pixel wide/tall sprites in 4 and 8 bpp color. For practical reasons, the best place for this speculative feature development to happen is on Discord. I don't mind discussing the results here, but the development cycle when prototyping new candidate features needs a collaboration environment with lower latency. Ping me over on the Hardware / Video channel and we can discuss what a demo/proof of concept for this should look like.

(Standard disclaimer: I haven't promised anybody a pony; I'm not saying this definitely will be a thing, just that we will have to establish the risk is low and it doesn't burn too many of our remaining gates.)

Waidaminute, 24 bits? Now THAT is intriguing. There's only so many of the bigger assets that you want or need to have on screen at the same time, so cobbling 48 bits out of squares and rectangles with 32 and 16 bits is fine, but an 8 / 16 / 24 / 32 ladder, that's interesting.

Good luck all.

Link to comment
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

 Share

×
×
  • Create New...

Important Information

Please review our Terms of Use