Jump to content

A cheap list of bank key:value pairs (cc65)


rje
 Share

Recommended Posts

Here's the fuzzy problem:  I'm slowly writing an interpreter for the X16, and I therefore need a symbol table.  HOWEVER, I can't really guarantee that I'll have much system RAM for neat data structures.  So I'm looking at the banked RAM to store my hashtable.  ALL of it if possible.

THAT has implications ... for example I don't think malloc() and free() are really going to work in banked RAM.

SO:

***

I'm trying a "cheap" banked string map implementation using a list.  The idea is to have a small, slow, symbol table (less than 1000 entries, spanning several RAM banks) with a minimum of management code (and a minimum of system RAM).  It doesn't even hash.

I understand that it will be slow and linear.  That's OK.

My thoughts go like this:

RAM bank memory can be chopped up into 256 segments like this:

typedef struct {
 
      unsigned char header;
      char data[31];
 
} BANK_LIST_SEGMENT;
 
BANK_LIST_SEGMENT* bank_list[256]; // = 8192 bytes = 1 bank.
 

Key:Value entries are segment-aligned:  the shortest mapping is 32 bytes, and the increment is 32 bytes, to a max of 288 bytes.  I'll try ways to optimize this -- for example, reducing the segment length to 24 bytes. Or scrapping it for dynamic memory management, since this is a solved problem... especially if code size doesn't really change....

I can run through the list and check the first byte of each segment.

If it's a zero, then that hunk of 32 bytes is unallocated.
If it's a printable PETSCII, then it's allocated and part of the current pair.
If it's 1 thru 31, then it's a new pair, and the key length is given (1-31 characters), so I know where the value string starts. (Or I can limit it to 30 chars, with a "31" indicating the start of the value string of the current pair for convenience.)

Keeping the key length known AND under 32 characters lets me do strncmp's safely (?).  I may have a problem with values if they don't end in zeros -- so I have to make sure then end in zero.  This may have length implications, especially if the zero intersects with byte zero of the next segment.  That means I'll have to store the length of the value string, which OK FINE.

 

 

Maybe this has already been done in a better way already.  I know hashes have been done-to-death in various ways.  Thoughts?

Edited by rje
Link to comment
Share on other sites

Posted (edited)

I'm just being lazy.  I've got two other algorithms -- one stolen from Perl -- that create bona fide linked and hashed ... uh, hashes.  I just get the feeling that I can do this in a more primitive way in order to leverage banked RAM and totally minimize system RAM.  Maybe - ha!

 

 

Edited by rje
Link to comment
Share on other sites

Posted (edited)

I had thought about a true hashed structure, with pointer-like things.

A "pointer" could be broken down into two parts like this:

typedef struct {
 
int offset : 13; // inside bank (13 bits = 8192 bytes)
int bank : 3; // eight banks, max 64K RAM.
 
} BANKED_HASHMAP_POINTER;
 

So then, navigating through a BANKED linked list of hash entries is a matter of (a) setting the bank and then (b) heading to the offset (0xa000 + offset). 

Then two more bytes to hold a cheap hash value.

Four bytes overhead per pair isn't bad.  A thousand entries = half a bank is devoted to overhead.  Hmm well that doesn't sound great but meh.

The memory management code might be a little more involved.

Edited by rje
Link to comment
Share on other sites

On 4/5/2022 at 2:25 PM, rje said:

I'm trying a "cheap" banked string map implementation using a list.  The idea is to have a small, slow, symbol table (less than 1000 entries, spanning several RAM banks) with a minimum of management code (and a minimum of system RAM).  It doesn't even hash.

If less than 1000 entries, with a max of 30 character long keys, you can have linked lists of symbols with the same length (Bank & Segment in Bank pointer) with the first block having the first two segments dedicated to the heads of each list. That leaves four free bytes, so the first pair of bytes in the bank is the pointer to the first free segment and the second pair of bytes is the pointer to the last available segment. You know from the length of the symbol you are looking up which linked list to use.

Note that that if you have a page buffer for transitory things, you could have a byte length for a value field of up to 224 bytes, and even if it spans HighRAM segment boundaries you could still fetch the matching key and value into the buffer. The redundant two byte length field in the key could be used to store the length of the key and the length of the value.

Edited by BruceMcF
  • Thanks 1
Link to comment
Share on other sites

On 4/5/2022 at 5:28 PM, BruceMcF said:

If less than 1000 entries, with a max of 30 character long keys, you can have linked lists of symbols with the same length (Bank & Segment in Bank pointer) with the first block having the first two segments dedicated to the heads of each list. That leaves four free bytes, so the first pair of bytes in the bank is the pointer to the first free segment and the second pair of bytes is the pointer to the last available segment. You know from the length of the symbol you are looking up which linked list to use.

Sounds a lot like how I managed virtual Commodore disk images.  Thanks for the good suggestion!

Link to comment
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

 Share

×
×
  • Create New...

Important Information

Please review our Terms of Use