Jump to content

New productivity upload: X16 Edit - a text editor


Stefan
 Share

Recommended Posts

On 2/18/2022 at 10:24 PM, Stefan said:

The help screen in X16 Edit is embedded into the executable as ASCII text, and takes quite a lot of space.

I've been looking for ways to make the executable smaller. One interesting option is to compress the text with Huffman coding:

https://en.wikipedia.org/wiki/Huffman_coding

After a lot of fiddling, I finally got it to work.

The compression is done by a python script. Decompression is done in 65C02 assembly within the X16 Edit executable.

But how well does it work? Some metrics:

  • Original help screen text size: 1,818 bytes
  • Compressed text size: 1,093 bytes (60 % of the original)
  • Lookup tables needed for decompression: 194 bytes 

And the 65C02 decompression code also takes some extra space. In my first working implementation, the executable became 422 bytes smaller, a saving of about 23 %.

Another issue is speed. The help screen is not displayed instantly, but I think it still is quite fast. I uploaded a video so you may see the decompression code in action for yourself.

Compressed help screen.mov

The text compression I'm using for Asteroid Commander is a little different and might work better for you. First, I put all the text I have so far in a single file. Then I copied it into a word cloud generator, and set that up to not ignore small common words like a, the, an, at, etc.

From that, I created a vocabulary of the most common words. I chose space for 384 words, but I have a lot more text; you might be able to get away with 64 words. So my vocabulary list is just two byte pointers to the start of the word. The vocabulary is stored in VRAM, the pointers are in low RAM. All the words are stored as their ASCII codes, but on the last letter bit 7 is set to 1, similar to the way error messages are stored in ROM.

To compress the text with only 64 words in the vocabulary, use codes 80-BF to represent those 64 words. Then to decompress, when a code in that 80-BF range is encountered, print a space and then extract the vocabulary word from VRAM.

To get 384 words I'm using codes 60-BF and E0-FF for one set of words (the short words) and the code 01 followed by another byte for a set of 256 words.

I'm getting about 3:1 compression using this method.

Edited by Ed Minchau
  • Like 2
Link to comment
Share on other sites

On 2/19/2022 at 11:57 AM, desertfish said:

Have you tried using lzsa and the kernal's memory_decompress() routine?  If it works it would at least save you the need to include your own decompression routine

Hi @desertfish!

I didn't remember that there was such a function in the Kernal.

I did a quick test just now.

Compiling the lzsa utility from the Github master worked without any issues on MacOS.

I compressed the help text with the following command, as advised in the X16 PRG:

  • lzsa -r -f2 <original_file> <compressed_file>

The output is a binary file that cannot be handled by a normal text editor. I imported the compressed file with the .incbin command supported by CA65.

The original file was 1,817 bytes, and the compressed file became 1,149 bytes (63 % of the original). Surprisingly, my own Huffman implementation did better, resulting in compressed size of 1,093 bytes (60 % of the original).

But as you said, I need not store my own decompression tables or routines, and the Kernal routine, which worked perfectly by the way, will in the end be more efficient.

My Huffman code decompresses directly to screen VRAM. As far as I understand, this is not possible with the Kernal lzsa decompress function. So you first need to decompress to RAM, and then copy from RAM to VRAM to actually display the text.

  • Like 1
  • Thanks 1
Link to comment
Share on other sites

I also looked briefly at the method described by @Ed Minchau

I would assume that in order to gain compression a word that is replaced by a code needs to

  • be longer than one character
  • occur more than one time

Analyzing my help text there are:

  • 25 words occurring 2 times
  • 5 words occurring 3 times
  • 5 words occurring 4 times
  • 2 words occurring 5 times
  • 1 word occurring 6 times
  • 1 word occurring 7 times
  • 1 word occurring 8 times
  • 1 word occurring 9 times

A total of 41 words.

I might try to combine lzsa compression with the Ed's method to see where I end up.

Link to comment
Share on other sites

On 2/23/2022 at 11:49 PM, Stefan said:

I also looked briefly at the method described by @Ed Minchau

I would assume that in order to gain compression a word that is replaced by a code needs to

  • be longer than one character
  • occur more than one time

Analyzing my help text there are:

  • 25 words occurring 2 times
  • 5 words occurring 3 times
  • 5 words occurring 4 times
  • 2 words occurring 5 times
  • 1 word occurring 6 times
  • 1 word occurring 7 times
  • 1 word occurring 8 times
  • 1 word occurring 9 times

A total of 41 words.

I might try to combine lzsa compression with the Ed's method to see where I end up.

It's not just the word that you're compressing here.  Every word has a preceeding space, so if you encounter one of these special codes, print a space and then the word.  You end up getting the word plus the preceeding space at the cost of just one byte, the same as just a space.  So, even one-character words like a or I can be compressed this way.

Edited by Ed Minchau
Link to comment
Share on other sites

Yes, I guess that is possible.

But you cannot assume that every word boundary is marked by a blank space. In my help text, for instance, words may be preceded by line breaks instead of blank spaces. The decompression routine could, of coarse, take that into account, and output a preceding blank space only if the previous character was not a line break. This might somewhat reduce the gain of replacing one letter words by a code.

Link to comment
Share on other sites

I did not have a lot of success with Ed's word method.

All words occurring more than once were included in the test, even one-letter words. These words were replaced by a byte code. The blank space preceding a byte code was stripped. The list of words was added to the end of the text block, as this list is needed for decompression. The resulting file was actually somewhat increased in size from 1,818 to 1,895 bytes.

Ignoring one-letter words was a bit better, resulting in a file of 1,650 bytes. But compressing this with lzsa resulted in 1,298 bytes, still not better than just using lzsa.

The effectiveness might rely on the characteristics of the text, especially how often the same words are repeated. Or maybe I'm doing something wrong...

  • Like 1
Link to comment
Share on other sites

  • 3 weeks later...

I uploaded a new version of X16 Edit today (0.4.4).

It uses lzsa to compress the help text. It is decompressed during program initialization using the built-in Kernal function. Version 0.4.3 was 15,532 bytes, and the new version 0.4.4 is 14,897 bytes, a difference of 635 bytes. There are also some other savings from my ongoing code cleaning. I really want the program not to go over the 16 kB boundary, as this would cause a serious problem for the ROM version.

Apart from that, the new version is mostly bug fixes:

  • The old code that read a file into the text buffer did not work properly in all circumstances. As discussed in another thread here, the Kernal function OPEN does not return an error even if the file is not found. That error is thrown not until you try to read the first byte. If the file does not exist, bit 2 of READST (read timeout) is set. The old code did work in most cases, but would fail if the file contained just one character.
  • In the previous version, the ROM based X16 Edit could not read the custom keyboard shortcut config file (X16EDITRC). This was caused by the name of the config file being stored within the executable. The config file name was not available after switching to the Kernal ROM bank when trying to open it. There was a similar problem in the function reading directory listings. The "file name" = "$" was stored within the executable, but was no longer available after switching to the Kernal ROM when opening it. Both these problems were solved by copying the file names to RAM before calling Kernal OPEN.
  • Like 3
Link to comment
Share on other sites

  • 3 weeks later...
On 3/28/2022 at 6:05 PM, Stefan said:

The VERA memory layout used by the Kernal was updated earlier today.

Yes, but that's just the default addresses for the different locations of bitmaps, tilesets, charset, sprite data etc for mode 128. It's always better to read those base addresses from the VERA registers at F9C0 and use those instead of hard-coding screen access. Then software will then adapt automatically, no rewrite needed.

I prefer to define my VERA layout myself anyway, this way my software always knows where all the stuff is located.

  • Like 1
Link to comment
Share on other sites

Tested the new fast loader Kernal function ("MACPTR") in X16Edit.

Haven't done any optimization, just getting it to work.

Results (manual timing) opening/reading a file of 169 kB in size:

  • Old byte by byte reading: 10.8 seconds => 15.6 kB/s
  • New MACPTR block reading: 2.5 seconds => 67.6 kB/s

Quite significant a difference.

  • Like 2
Link to comment
Share on other sites

  • 2 months later...

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

 Share

×
×
  • Create New...

Important Information

Please review our Terms of Use