Jump to content
  • 0
Stefan

Native assembler, preferably open source

Question

Hello,

Is there an assembler for 6502 that could be made to run natively on the X16?

Preferably it would:

  • Be open source
  • Read source code from files
  • Assemble directly to files
  • Support at least named labels, basic compiler directives (for instance .byte, .word, .text) and macros, simliar to the Commodore Macro Assembler that you could use on the C64

I've been searching, but it seems hard to find.

I used the Commodore Macro Assembler a lot with my C64 in the 80s. Even though I had a disk drive, it was cumbersome and took quite some time to edit, save, assemble, load and test each change in a program. If the real X16 reads and writes SD card files at the same speed as the emulator, we would be getting much closer to a modern experience with a file based assembler.

Share this post


Link to post
Share on other sites

Recommended Posts

  • 0
On 1/16/2021 at 9:02 AM, Stefan said:

From the discussion in the File I/O Performance thread I think we know how to optimize the speed of file reading:

  • Disable interrupts
  • Select Kernal ROM before the read loop
  • Maybe getting the status value directly from memory instead of calling READST

The assembler you're working on is very promising.

I guess the next big step is supporting symbols/labels.

These are the thoughts I have had on that subject:

  • The symbol table entries could have a fixed length, making it easier to store and search them
  • The symbol names could be long but not more than maybe 256 characters; to limit the size of the table you could opt store not the whole name, but for instance this information that is used for searching/matching symbols:
    • A suitable number of characters from the start of the symbol name (for instance 6 characters)
    • The total length of the symbol (Forth does this)
    • An 8 bit checksum (for instance just adding the character ASCII values together).
  • The symbol search could start from the end of the table. This makes it possible to redefine symbols, as the assembler will find the symbol that was defined last. This method is used in Forth.
  • Redefined symbols could be used to support scopes, which I find very valuable in ca65. When starting a new scope the assembler needs to store the index of the last item in the symbol table. New symbols defined within the new scope are added to the symbol table at the end. When checking for not allowed duplicate symbols, the search would start from the end of the table and stop at the index where the scope started. And when leaving the scope the pointer to the last symbol would be reset to the index before the scope started, effectively forgetting all symbols created within the scope. These are not in any way my ideas, it is just a combination of Forth and David Salomon's book that I mentioned above in this thread.

One suggestion: use a hash table. I know that sounds adventurous on a 6502, but it will make a HUGE difference when assembling large programs.

Up to 80% of the time spent assembling large programs can be spent in symbol table lookups if you simply use linear search. I've measured that.

My really dumb solution, crafted at 17 years of age while I was afraid to waste time calculating a hash function: hash each symbol on its starting letter. Effectively, I created 26 linked lists. Each symbol was allocated consecutively in memory, and linked into the list for its starting letter. But even that was enough for 3-4x performance gain.

 

Share this post


Link to post
Share on other sites
  • 0
Posted (edited)

Many years ago I had an assembler on the Plus/4 that a guy in the California Plus4 users group mailed me on disk.   It was a modified version of the SPEEDSCRIPT word processor.   It was two pass, I think, and I did a few things with it when I was ready to get beyond futzing around in MONITOR and keeping track of my own addresses/variables/jumppoints in on scrap paper.    The source came with the disk, so maybe it would be an interesting native assembler for the Commander X16.    Let me search around a bit and see if I can find it on any of the Plus4 sites. 

Edited:    Man, the guys on Plus4 World have EVERYTHING.   They really do.    Here's what I was talking about:

http://plus4world.powweb.com/software/ASMP4

I've attached the readme from the archive at the link, to this post.     Not sure if anything there would be of help to someone working on a native X16 assembler but who knows.   The program was highly tied to Speedscript and its commands/editing interface, which was just as text-based-"lookit up or memorize it" as all the other wordprocessors of the day, but maybe there's something in the code that would be interesting?   Lookis like the source code is included, but I think it has to be read with SPEEDSCRIPT as I can't pull it up in a text editor.   

Was speedscript stored in Screen code?   Maybe that's the issue.    I'll have to install the VICE +4 emulator and see what I come up with. 

ASMP4HLP.ASC

Edited by Snickers11001001

Share this post


Link to post
Share on other sites
  • 0

@Dejital I think I found a simple but still fairly good "hash" function based on the first two and last letter and the length.  I've been experimenting with it a little, but it's not yet integrated in the assembler because the actual hash table isn't programmed yet.  The function results in 128 buckets so a lot less collisions than when just looking at the first letter.

 

Share this post


Link to post
Share on other sites
  • 0
23 hours ago, desertfish said:

@Dejital I think I found a simple but still fairly good "hash" function based on the first two and last letter and the length.  I've been experimenting with it a little, but it's not yet integrated in the assembler because the actual hash table isn't programmed yet.  The function results in 128 buckets so a lot less collisions than when just looking at the first letter.

 

I monkeyed around in PHP to test a few ideas - I piped the aspell dictionary into my algorithm and counted hash collisions. The best I could come up with was the following bytes:

[chars1-3] + N + M1 + M2 + M3
(delete first 3 chars and compute N, M1-3 using the remaining string)

N = num chars + value of final char

The M's are uint8 modulos that just increment for each character, rollover accepted.

M1 is simple M1 += char(x), M2 += char(x) % 17, and M3 += char(x) % 23.

I played around with the modulo values for M2 and M3 and they don't affect the performance very much one way or the other.

The dictionary contains ~130k words, and has 2637 hashes with collisions, the most collisions for a single hash = 9.

So not great, but not terrible either, and it should be fairly fast to compute. I also played around with using char() >> 2 sort of thing instead of %prime, and that got a lot more collisions, but it would be faster than modulo.

 

Share this post


Link to post
Share on other sites
  • 0

the function i was experimenting with is: 

((c0 + clast + length) ^ (c1*4)) & 127

where c0 = first char, c1 = second char, clast= last char, length=length of name.  it doesn't use modulo at all

Share this post


Link to post
Share on other sites
  • 0
Posted (edited)

I tried some XOR stuff and got much better performance:
1st3 chars . Nchar+lastChar . M1 . M2 . M3

where M1-3 all start at zero and for each char:

M1 ^= Char, M2 ^= Char>>1, M3 = M3<<2 ^ Char>>2

This gives only 486 hashes that have collisions.

Whoops - my code didn't clamp the results of the M's to a single byte - when I did that it got a lot worse, lol.

Edited by ZeroByte
whoops

Share this post


Link to post
Share on other sites
  • 0
Posted (edited)

tweaked the function a bit, I now have this

(c0 + clast + c1*4) ^ (length*4) & 127

resulting in this distribution over 128 hash buckets for random labels generated from dictionary words:

image.thumb.png.79a39bcdf3644e125070ce80bd541811.png

This assumes we know the length of the symbol beforehand.

Otherwise you can just as well simply add up all the letters making up the symbol while scanning it to determine the length:  (this also seems to yield better results when I feed it with shorter label names):

image.thumb.png.55947bc1c0d1a1b4f31870cadb27a409.png

Edited by desertfish

Share this post


Link to post
Share on other sites
  • 0
Posted (edited)

Wouldn't you at least want 256 hash buckets? 128 labels seems awful tight.

My 24-bit hash seems pretty good now: $n$o$p
$n starts at zero and for each iteration $n += $i | ($i << 4), and at the end it ^= the first char of the label
Then $o and $p are simple ^ of all even or odd chars.

Results:

There were 102492 hashes generated. (all words)
  1 : 86212  84.1%
  2 : 12869  12.6%
  3 : 2651    2.6%
  4 : 592     0.6%
  5 : 139     0.1%
  6 : 26      0.0%
  7 : 2       0.0%
  9 : 1       0.0%

There were 36774 hashes generated. (8-char or less)
  1 : 31406  85.4%
  2 : 4246   11.5%
  3 : 896     2.4%
  4 : 170     0.5%
  5 : 46      0.1%
  6 : 10      0.0%

Edited by ZeroByte

Share this post


Link to post
Share on other sites
  • 0

the buckets can contain 1 or more labels. (collison)   so yeah if you have collision, you still have to scan in the bucket list, but it will be a LOT shorter

Share this post


Link to post
Share on other sites
  • 0
Posted (edited)

Oh I see why 128 buckets now - because 256 could consume a lot more memory for not a lot of savings in search time. Using a larger hash where you only store collisions could end up taking longer because of the logic required to dynamically allocate buckets. I do tend to over engineer stuff. Mostly I was just having fun trying to find a hash algorithm with minimal collisions. 😉

Entropy's a strange beast - you can do all kinds of twiddling values around, but in this type of problem, if you're not introducing entropy, then you're not helping the hash be more unique.

Edited by ZeroByte
  • Like 1

Share this post


Link to post
Share on other sites
  • 0

Yeah it was a stupid trial and error process to come up with that function above. Using a dictionary + statistical plot in a separate tool was a great idea (i used python) I was going for 256 buckets first but the distribution was extremely skewed to one half of that so I now simply and with 127 to get rid of that

It is what it is, I'm now going with the very simple function of just adding all letters and xoring the length -- that seems to behave fairly well and is very fast to calculate

In any case the hash function is trivial to replace with a better one later.

  • Like 1

Share this post


Link to post
Share on other sites
  • 0
25 minutes ago, desertfish said:

I was going for 256 buckets first but the distribution was extremely skewed to one half of that so I now simply and with 127 to get rid of that

I realized the reason for this - the char set only really distributes over 7 bits' worth of space, so the 8th bit never gets any entropy. I saw where you were <<4 in your function, as I had also done in my function. This was because I realized that you're never going to realistically get enough chars to put any bits into the high-order so I figured just pretend it's a nibble and that'll add some entropy. Since it's then adding to a uint8, (i.e. modulo 256) that makes it become actual entropy instead of just "the same value twice"

Share this post


Link to post
Share on other sites
  • 0
Posted (edited)

Exiting results.

The use of the hash table provides a massive speedup.  It now assembles around 32 kilobyte of source code with 700 symbols in it (basically a giant sequence of labels and JMPs to those labels) in just over 5 seconds.

I've updated the file in the downloads section to this new version.

Edited by desertfish
  • Like 4

Share this post


Link to post
Share on other sites
  • 0

Hey stefan thanks for checking it out again, yeah it's coming along nicely so far.

Although there are some things in the TODO list now that I don't have any idea how to implement them yet or how much time it will cost.  It can be a long time befire it is an actual useful assembler tool ... 😐 

 

One thing that could help to make it more practical to work with is when used together with your text editor, that there is a way to switch between them without having to load the other tool from disk again and again to edit-assemble-edit-assemble.  Can you think of a possible way to have both in memory at the same time somehow? and easily swap between them?

Share this post


Link to post
Share on other sites
  • 0

One option is to store the assembler and the editor in ROM. Or at least one of them. X16 Edit already supports this.

I've made a PR in the KERNAL project so we would have a BASIC command to start ROM based programs. I call this command START, and you would type for instance START "ASSEM". That would make it easy to start and switch between programs stored in ROM. But I don't yet know if there is any interest. And I don't know how easy it will be for the end user to add custom programs to the ROM. 

Without a START command in BASIC you could have a routine in your assembler that starts up the ROM based editor. That's what's done in Volksforth.

I read somewhere that you live in the Netherlands. Many years ago I studied there for one semester, in the city of Tilburg. It was a lot of fun. Have a nice weekend!

  • Like 1

Share this post


Link to post
Share on other sites
  • 0
7 hours ago, Stefan said:

a routine in your assembler that starts up the ROM based editor.

I don't think the assembler would work in ROM and I don't quite understand all the steps required to make it work like that. So this suggestion should be the easiest to implement.

Yup I'm from the Netherlands, but I don't come to the southern parts much (where TIlburg is located).  I've worked in Tiburg for a couple of months though, once.  I don't remember very much from it to be honest 🙂  

Share this post


Link to post
Share on other sites
  • 0
Posted (edited)

Not much to remember about Tilburg, I'm afraid 🙂

It's not very easy, but I wouldn't say it's very hard either to make a program romable.

Some pointers:

  • The ROM image must fill a whole ROM bank. That is 16,384 bytes. No more, no less.
  • In order for the computer not to crash, $fffe-ffff in your ROM bank must point to an interrupt handler in RAM. The KERNAL handler is located at $038b, called "banked_irq" in the KERNAL source code. You can let $fffe-ffff point to this.
  • You need a way to call KERNAL functions in ROM bank 0
    • You cannot call a KERNAL function directly from another ROM bank. There has to be bridge code in RAM.
    • This is so because if your ROM code tried to directly switch ROM bank and call code in another ROM bank, as soon as the ROM bank is switched code execution continues at the same address but in the new ROM bank. And your ROM code looses control.
    • One solution is to include the KERNAL source "inc/jsrfar.inc". It uses a small code stub stored in RAM called jsrfar3, located at $02c4
    • Then your ROM code should be able to call functions in other ROM banks with jsrfar in the normal way
  • Finally, if you need or want to call KERNAL functions at their normal addresses, like $ffd2 for outputting a character, you need to implement this yourself in the ROM bank
    • First you would have a "list" of jumps situated at the desired place near the end of the ROM bank, like $ffd2
    • This would jump to a function in your ROM code that does the jsrfar.

I don't know if I'm very good at explaining this. I think it's easier than what it sounds. If you're interested I could make a small example implementation.

EDIT: Attached is a minimal working ROM, only supporting call to $FFD2

  • banks.inc and jsrfar.inc are copied from the KERNAL project
  • test.s is the ROM image source code
  • test.cfg is memory configuration for CA65 assembler
  • build.sh is a build script

banks.inc build.sh jsrfar.inc test.cfg test.s

You need to append the ROM image to the default ROM image you got with the emulator (file name rom.bin). On Linux/MacOS you could type:

cat rom.bin test.bin > customrom.bin

When starting the emulator you need to specify the custom rom with the -rom switch.

To start the ROM code you need to enter startup code in RAM, for instance in the monitor like this:

lda $01
pha
lda #$07
sta $01
jsr $c000
pla
sta $01
rts

 

Edited by Stefan
  • Like 1

Share this post


Link to post
Share on other sites
  • 0

This is good stuff, however there is a fundamental issue: I'm using prog8 to develop the assembler and the prog8 compiler assumes the program ends up in RAM. It generates code around that principle (embedded variables, sometimes even modifying code).

Share this post


Link to post
Share on other sites
  • 0
1 hour ago, desertfish said:

This is good stuff, however there is a fundamental issue: I'm using prog8 to develop the assembler and the prog8 compiler assumes the program ends up in RAM. It generates code around that principle (embedded variables, sometimes even modifying code).

I understand. That I cannot help you with.

The only viable option for you might be to let the assembler stay in RAM and use the ROM based editor. You could make a function in the assembler to start the editor to make that a bit easier. At least until there is a better solution for starting user-added ROM based programs in the Kernal.

If your assembler is using parts of banked RAM or the golden RAM, that should not be a problem. On startup you may set what parts of banked RAM the editor may use. And on startup the editor takes a snapshot of golden RAM and restores it on exit. This functionality has not been thoroughly tested. But hopefully it's working.

  • Thanks 1

Share this post


Link to post
Share on other sites
  • 0

@Stefan great job on that rom version of x16 edit, it was surprisingly easy to incorporate it:

sub edit_file(uword filename) {
    cx16.rombank(7)     ; activate x16edit, assumed to be in rom bank 7
    if filename {
        cx16.r0 = filename
        cx16.r1L = string.length(filename)
        %asm {{
            phx
            ldx  #1
            ldy  #255
            jsr  $c003
            plx
        }}
    } else {
        %asm {{
            phx
            ldx  #1
            ldy  #255
            jsr  $c000
            plx
        }}
    }
    cx16.rombank(4)
}

 

Does also save/restore the zero page that it uses?

 

Share this post


Link to post
Share on other sites
  • 0
Posted (edited)

Yes, it's supposed to restore the zero page it uses as well.

It's done by copying the following memory ranges to banked RAM:

  • Zero page: $22-$34
  • Golden RAM: $0400-$07FF
Edited by Stefan
  • Like 1

Share this post


Link to post
Share on other sites
  • 0

Fantastic then it's correct that the only thing I had to worry about is keeping the X register safe (prog8 uses it internally as a stack pointer of sorts). 
It's so much more convenient now to not have to reload both applications all the time 😎

Share this post


Link to post
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Answer this question...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.


×
×
  • Create New...

Important Information

Please review our Terms of Use