Jump to content
desertfish

New productivity upload: file based assembler

Recommended Posts

Nice progress @desertfish!

I mostly works as expected when I made my own simple "hello world".

I also put the string at the end of the code as in your own hello world test. I tried to load each character in the loop with lda message,x and lda message,y, but that failed. Apparently the assembler did not recognize a label defined later in the code. It worked fine when I changed "message" to a fixed hexadecimal address.

If symbols with 32 characters would become too heavy, you could always do what's been done in other languages, for instance Forth.

  • Store only the first few characters of a symbol in the symbol table
  • Also store some other metadata in the symbol table, for example the length and/or a checksum, thereby minimizing false duplicates
  • This could save space and speed up assembly
  • Like 1

Share this post


Link to post
Share on other sites

Hi thanks for trying it out. Can you post the failing program? Because it should be able to deal with undefined symbols 

Share this post


Link to post
Share on other sites
* = $8000
 
CHROUT = $FFD2
 
LDY #0
LOOP:
LDA MSG,Y
BEQ EXIT
JSR CHROUT
INY
BRA LOOP
EXIT:
RTS
 
MSG:
.STR "HELLO, WORLD"
.BYTE 0

Share this post


Link to post
Share on other sites

I've fixed the issues in a new upload. Turned out it wasn't correctly handling absolute-indexed with symbols

About the symbol table: how does Forth deal with collisions? 
prefix+length is way to simple ( "name1" and "name2" will be the same entry) and adding a "checksum" will only work to a certain extent if you mean "hash", I suppose.  Still there is no guarantee that we don't have collisions.

Share this post


Link to post
Share on other sites

According to the book Starting Forth, the Forth-79 standard allowed symbol names of up to 31 characters. But some variants of Forth only stored three characters + the length of the symbol. "name1" and "name2" would then be the same, which is not ideal.

I don't suggest that you copy that approach as is. It's more an inspiration.

There's always the risk for collisions. Checksums/hashes are probably better than symbol length. Only storing the first three characters is probably too little.

Share this post


Link to post
Share on other sites

Well, a correct hashtable implementation will deal with collisions (colision lists or other solution)

Not dealing with  possible collisions will result in extremely hard to track down bugs in your resulting machine code. Without any warning certain symbols suddenly will pick up the value of others... I can't imagine that's acceptable in forth either.  I assume it uses a trick to deal with this as well

Share this post


Link to post
Share on other sites
6 hours ago, Stefan said:

According to the book Starting Forth, the Forth-79 standard allowed symbol names of up to 31 characters. But some variants of Forth only stored three characters + the length of the symbol. "name1" and "name2" would then be the same, which is not ideal.

I don't suggest that you copy that approach as is. It's more an inspiration.

There's always the risk for collisions. Checksums/hashes are probably better than symbol length. Only storing the first three characters is probably too little.

In the Forth's that stored 3 characters (eg, the original FIG Forths), they dealt with collisions by the first one defined was the one stored in the dictionary, "good grief, keep track of what you are doing, you idjit" ... similar to CMB Basic variable names except different length names with the same first three letters were also distinct ... in ANS Forth and successors, some implementations use hashing to speed up dictionary searches, but the entire name is stored.

Share this post


Link to post
Share on other sites
5 hours ago, desertfish said:

Well, a correct hashtable implementation will deal with collisions (colision lists or other solution)

Not dealing with  possible collisions will result in extremely hard to track down bugs in your resulting machine code. Without any warning certain symbols suddenly will pick up the value of others... I can't imagine that's acceptable in forth either.  I assume it uses a trick to deal with this as well

Unless you plan to allow redefined symbols, the assembler could throw an error when it encounters a duplicate definition, whether an actual duplicate or a false match. 

Share this post


Link to post
Share on other sites

That is an interesting idea. Sometimes "good enough" is good enough and we can think of a different symbol name to satisfy the assembler.

  • Like 1

Share this post


Link to post
Share on other sites

No, not really... All good string hash functions use multiplications and we can't do that on the 6502 and also still keep it fast...

  • Like 1

Share this post


Link to post
Share on other sites
22 hours ago, desertfish said:

No, not really... All good string hash functions use multiplications and we can't do that on the 6502 and also still keep it fast...

The question is what you are aiming for with the hash. A hash that AIMS to avoid collisions, so collisions are a special case, is one approach, another is to just accelerate things compared to a sorted linked list or binary tree by having more but substantially smaller linked lists or trees, so that they do not get bogged down to the same extent as the wordspace grows.

Then something as simple as the bottom four bits of the XOR of the bytes in the name may serve well.

Share this post


Link to post
Share on other sites

The current symbol table implementation is simplistic but should be easily replaceable with a different smarter one. Because it has a very basic interface to the assembler logic itself. I don't think I have the time to build a smarter symboltable myself, so hopefully someone else can jump in, who knows 🙂

  • Like 1

Share this post


Link to post
Share on other sites
Posted (edited)
On 2/28/2021 at 7:07 AM, Stefan said:

Sure is.

Any idea about a suitable hash function?

https://en.wikipedia.org/wiki/Linear-feedback_shift_register#Galois_LFSRs

https://github.com/eternal-skywalker/cx16-lib/blob/main/lfsr.s

David Murray mentioned using an LFSR to generate random maps in a game instead of manually creating and storing them.

A hash function needs more than this, but it is something to start with.

 

Edited by Terrel Shumway

Share this post


Link to post
Share on other sites

Updated the assembler, added the feature to save the assembled program to disk.

(note that assembling is still done into system memory first as intermediary step, this is something that will be changed in a future version to allow to assemble larger programs)

Share this post


Link to post
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.


×
×
  • Create New...

Important Information

Please review our Terms of Use