Jump to content

r39 patched Kernal to fix LOAD into HiRAM functionality


ZeroByte
 Share

Recommended Posts

One of the annoying things about using SD card mode is that LOAD fails when loading into banked RAM.

As many know, when using the emulator without an SD image, it performs a hyperload directly into memory w/o emulating the actual file transfer. It does this by skipping the kernal routine entirely. When using SD card images, the Kernal runs as normal.

Using hyperload, you can issue a command to BASIC like this:

LOAD "MYFILE",8,3,$A000
This loads MYFILE into bank 3 of Hi Ram, and it is automatically continued into banks 4,5,6... as necessary. This operation fails when using Kernal (SD card attached). The same works/fails conditions apply when calling SETNAM,SETLFS,LOAD from within programs.

I got really really tired of this, so I decided to fix it in the Kernal. As I'm not a code god, I'd like to see if anyone experiences any instabilities or crashes with it before I do a pull request to include the fix on the main repo. The ROM image is attached below. Use it with emulator R39 or Box16. It will not work in R38.

I tested the performance with my additional code and the impact is negligible. I haven't checked what the return values look like after a LOAD into Hi RAM. The Kernal's output message gives the memory address where the load finished, and the final bank remains active, so I presume the return values in X/Y are similar. There's not much space available in the ROM so I'm trying to keep from adding too much to it.

Let me know if you experience any crashes with it that don't happen using the current repo version of R39.

Thanks.

EDIT: Dec 05, 21:17 UTC  (3:17pm Central time): Updated the attached ROM to fix a bug reported by @desertfish

EDIT2:

UPDATE - I've re-done the patch with a much smaller code size, saving over 60 bytes vs. my previous patch. The new one is MUCH easier code to read and follow, and it's only slightly slower. For a 96KB file I'm testing with, the previous patch took 60 jiffies (frames) to load, while the new routine took 63 jiffies. This speed penalty only happens on loads into banked memory, so I call that a win. I have updated the file here again.

If you downloaded the previous ones, then I recommend that you re-download this one, as it's the one I'm using for the pull request into the Kernal repo.

 

 

rom-r39-fixedload.bin

Edited by ZeroByte
Updated the ROM file with latest fix
  • Like 3
  • Thanks 4
Link to comment
Share on other sites

Some questions from the Discord channel:

The X16-docs repo is not up-to-date regarding the behavior of load. It doesn't document the VLOAD command, and explains loading into VRAM using old LOAD syntax (and old VERA banking schemes which no longer exist).
Do not be misled by this errata. The intended behavior of LOAD (according to emulator behavior, and comments in the BASIC ROM source)  is that it should have 2 forms:

  • LOAD "FILENAME"[,device[,redirect]]
    • redirect is the familiar ,1 which means LOAD will use the 2-byte file header as the load address, not start-of-BASIC
  • LOAD "FILENAME"[,device[,bank,address]]
    • This is the way to specify a target memory location for LOAD - the 2-byte header is simply skipped and the remainder of the file is loaded to the specified address

It is this second form which I've fixed. It previously worked for destinations in main memory, but would result in bank0 being overwritten insted of the intended bank. Bank overflow was also not supported - the LOAD would just continue up into the ROM region.

EDIT: I've tested the first form with my patch, and I'm happy to report that this version is also fixed. If the file's 2-byte header results in a LOAD into banked memory, the loading behavior is the same as if you had specified the load address in the LOAD command itself. The one difference for this form is that the active bank will become the first one used during the LOAD - i.e. if bank 6 is active, then the file will LOAD into 6,7,8....

Edited by ZeroByte
  • Like 1
Link to comment
Share on other sites

One more note: my patch has a corner case that is not fixed: if a LOAD begins in low memory and proceeds up to the BANK area, crossing over IO space, it will usually write some overflow data into RAM bank0 (the Kernal's space) instead of the selected bank.

Fixing this would require one of the following:

  1. adding more code to the patch (the DOS bank is almost full so this is undesirable) to handle this case
  2. changing the algorithm a bit to reduce the code size and handle the case, but adding ~1000 CPU cycles per page of data transferred into HiRam

I think it's best to let the write spill into Kernal's RAM bank, as it's already a bad day if a LOAD is writing data into IO space. Go ahead and crash hard so developers/users figure out not to do whatever they've done.

What say ye, community? Do you agree with this line of reasoning?

Note that the ROM I've posted has one difference from my current revision. Current revision allows the $9Fxx write to corrupt the Kernal ram bank. The one posted here tried to fix that, but it still has a bug where the first byte of a write to $9Fxx will appear at $BFxx in bank-1. The rest go to IO space, or the beginning of the correct bank.

EDIT - this bug has been fixed.

Edited by ZeroByte
Link to comment
Share on other sites

On 12/3/2021 at 12:34 PM, ZeroByte said:

I think it's best to let the write spill into Kernal's RAM bank, as it's already a bad day if a LOAD is writing data into IO space. Go ahead and crash hard so developers/users figure out not to do whatever they've done.

I'm not for this approach.  I worked in the automation control industry and randomly writing into I/O space was just wrong and could cause permanent damage to man & machine. While I don't expect that anyone would have an X16 hooked up to an industrial welding robot, they many have their own hardware connected and not have designed in failsafes for random events. 

I'd suggest an approach that prevented a load crossing into I/O space or beyond. Just reject the load, or fail just at point of crossing into danger.

It's just safer & easier to prevent a crash then try to pick up the pieces afterwords.  

Link to comment
Share on other sites

On 12/3/2021 at 3:26 PM, desertfish said:

the current LOAD routine already walks right over the I/O area I believe, so it didn't get worse (but not better either in this regard).

Thanks for letting me know. I'm glad I've only got an emulator running. I'd hate to have hardware with this design flaw in it. 🙄

Link to comment
Share on other sites

On 12/3/2021 at 5:34 PM, Edmond D said:

Thanks for letting me know. I'm glad I've only got an emulator running. I'd hate to have hardware with this design flaw in it. 🙄

A: "It's not a bug, it's a feature."

B: "But it's caught on fire..."

 

  • Like 1
Link to comment
Share on other sites

I did find a way to fix my modifications to the copy routine such that it now correctly handles writes to the $9Fxx page, sending any offset bytes that reach the bank window into the intended bank. The fix only added 2 bytes to the size in doing so. The DOS ROM has virtually no free space left in it, so I'm trying to be as frugal with space as possible. The normal non-bank-area copy routine could be shrunk by a few bytes but would add ~500 clock cycles per page transferred, so I opted to leave it alone.

I've updated my fork with this latest fix.

As for making LOAD throw errors if it reaches IO space: Adding error conditions to LOAD is beyond the scope of this patch. Do keep in mind that for microcomputers like this, it's assumed that everything works a certain way and expects you to do things properly w/o any safety nets due to resource constraints. There just isn't enough time or space to cover everything that might go wrong for a general-purpose system. Anyone using a 6502 to control critical systems would be writing bespoke code and running it on bare metal and thoroughly testing it. There's also the case where a LOAD into HIRAM which walks off the end of the installed memory chips could happen, but the code doesn't check that either. There's also no way to know you aren't overwriting something else important like other data you didn't intend to overwrite or zeropage, or the stack, or the kernal/basic area of memory which also happens to include the IRQ/BRK/NMI vectors...

EDIT: I've updated the ROM file on this post to include this bugfix.

Edited by ZeroByte
Link to comment
Share on other sites

On 12/3/2021 at 6:58 PM, ZeroByte said:

... As for making LOAD throw errors if it reaches IO space ...

The more period-correct way to avoid that kind of overwrite would be to end the load if it hits $9F00 ... maybe set carry to indicate the load was abandoned early.

If that is built into the binary page adjustment, then you need the binary page adjustment to always take place on a true binary page boundary ... so if, eg, the start address was $8080, then the destination page would be $8000 with $80 in the index register, and when the index register hits zero, increment the high byte of the page address and check if it equals $9F ... if so, jump to the "load finished" handling.

If the START address was in the I/O page, that wouldn't change anything. Which is also period-correct. If you start with an I/O page address, you better know what you are doing.

Edited by BruceMcF
Link to comment
Share on other sites

The page thing is more optimized than that for block reads. It only checks twice per block (once in LOAD after the block transfer and once in DOS prior to doing the copy.

The DOS routine now sets a ZP variable = Y index where bank swap happens. It then does a loop only having to add cpy ZP per byte.

if the data ptr is < 9f00 or > BFFF then it uses the original loop that doesn’t waste time checking for page boundaries

LOAD calls block read repeatedly and updates its own pointer after each call, so that pointer must get wrapped down but this check only happens roughly 4 times per KB.

it would be easy enough to early exit if initial call was >= C000 and return out of memory.error, and maybe the same for page 9F. Personally I prefer the don’t do this or it crashes method. 😁

  • Like 1
  • Thanks 1
Link to comment
Share on other sites

On 12/3/2021 at 7:44 PM, BruceMcF said:

If that is built into the binary page adjustment, then you need the binary page adjustment to always take place on a true binary page boundary ...

Re-reading your post, I noticed this bit.

Actually, the destination pointer during the block copy routine is NOT page-aligned at all. And by extension, neither is the shell loop in the LOAD routine itself. LOAD could probably force the issue by requesting a block read of N bytes to become page aligned, but that would be a lot of logic to run for each iteration of the loop, and it would be fighting against the fat32 driver's attempts to keep itself page-aligned, and a full 256-byte block transfer would almost never happen. Since LOAD is high level and DOS is low level, it's probably best for DOS to get to control the flow.

In debugging, I found that the very first block copy of a LOAD usually transfers $FE bytes and not $100 bytes. (PRG header removed from first transfer). Thus, when i was loading multi-bank data starting at A000, I found that the first write was FE bytes, then $100 for a few more writes. So the first bank boundary crossing was actually a 256-byte write starting at BFFE. Interestingly, this means that it's probably quite common for most of the bytes copied to incur the +1 clock cycle "penalty" for page crossing in (ZP),Y addressing mode.

I'm not sure whether the source buffer is page-aligned, but I'm fairly certain that it must be.

For anyone interested, the reason the 2 skipped PRG header bytes causes this is that the DOS keeps a kind of cache in bank0. It loads from SD one cluster at a time - I didn't go further down the rabbit hole to see how that operation happens and whether it scraps the buffer if you switch between two open files or anything like that. I just know that LOAD calls it, it returns all or part of the current page of buffer. When LOAD sees that it has not received all of the data, it calls "macptr" again which gets directed into fat32_read, which empties the next page of buffer. If at any point in this loop, the buffer is empty, it refills it with one cluster of data (512 bytes) until that's drained, etc.

P.s. - if you told me 2 years ago that I'd end up this deep in a file I/O routine, I'd've probably laughed at you

  • Like 2
Link to comment
Share on other sites

On 12/3/2021 at 11:08 PM, ZeroByte said:

... it would be easy enough to early exit if initial call was >= C000 and return out of memory.error, and maybe the same for page 9F. Personally I prefer the don’t do this or it crashes method. 😁 ...

 

On 12/4/2021 at 1:31 AM, ZeroByte said:

Re-reading your post, I noticed this bit.

Actually, the destination pointer during the block copy routine is NOT page-aligned at all. And by extension, neither is the shell loop in the LOAD routine itself. LOAD could probably force the issue by requesting a block read of N bytes to become page aligned, but that would be a lot of logic to run for each iteration of the loop, and it would be fighting against the fat32 driver's attempts to keep itself page-aligned, and a full 256-byte block transfer would almost never happen. Since LOAD is high level and DOS is low level, it's probably best for DOS to get to control the flow. ...

Yes, the second is a pretty strong argument for the "don't do this" approach.

You wouldn't want to go a page at a time and fail out when the load is within 256 bytes of $9F00, since you want to be able to load up to EXACTLY $9EFF. So the game is not worth the candle. The place to do boundary checks is higher up than this routine ... in the code that is using this routine to load something into RAM.

Link to comment
Share on other sites

I think honk the optimal way to stop load at 9EFF would be to add one more check to the block copy routine: It does a series of checks to limit transfer size to the smallest of: full page, bytes in buffer, file bytes remaining, bytes requested by caller.

Add one more check: if page is 9E then limit to $100-ptr_low.

This would make LOAD start the next loop at 9F00 where it can error out.

I still hold the position that this extra range checking functionality adds too much size and performance impact to DOS and the Kernal.

 

  • Like 1
Link to comment
Share on other sites

From the (limited) time I spent with this patched kernal it seems to work fine for me!  So yeah go ahead with the pull-request on github.  This is quite an important one, as it will probably fix a whole lot of data-load issues when running from sd-card 😐 

(also I want to integrate it with my custom v39 rom build, because a few other patches are in there as well)

edit: found an issue when loading to banked addresses when LSB=$02 , thinks work ok if LSB=$00

Edited by desertfish
  • Like 1
Link to comment
Share on other sites

On 12/4/2021 at 10:46 AM, ZeroByte said:

... I still hold the position that this extra range checking functionality adds too much size and performance impact to DOS and the Kernal.

I reckon that ACTUALLY Kernel way to do it would be to offer a Kernel call that does a sanity check on the operands of a load, returning carry clear if the load stays within a "zone" (Low RAM, I/O Page, High RAM window), carry set if it WOULD cross a boundary.

And let the programmer decide whether to sanity test or not.

Link to comment
Share on other sites

On 12/4/2021 at 3:53 PM, BruceMcF said:

I reckon that ACTUALLY Kernel way to do it would be to offer a Kernel call that does a sanity check on the operands of a load, returning carry clear if the load stays within a "zone" (Low RAM, I/O Page, High RAM window), carry set if it WOULD cross a boundary.

And let the programmer decide whether to sanity test or not.

Unfortunately the kernal doesn't know how much it's loading until it has loaded. As least in as much as the kernal works with serial IEC devices.

  • Like 2
Link to comment
Share on other sites

On 12/4/2021 at 8:28 PM, Scott Robison said:

Unfortunately the kernal doesn't know how much it's loading until it has loaded. As least in as much as the kernal works with serial IEC devices.

Oh, wait ... yeah, the block size count is in the directory system in the 1541, not in the Kernel. Finding it for the SD filesystem might not be too challenging, but on the IEC bus, it's entirely arbitrary how the connected bus records file sizes.

Edited by BruceMcF
Link to comment
Share on other sites

On 12/4/2021 at 6:34 PM, BruceMcF said:

Oh, wait ... yeah, the block size count is in the directory system in the 1541, not in the Kernel. Finding it for the SD filesystem might not be too challenging, but on the IEC bus, it's entirely arbitrary how the connected bus records file sizes.

Exactly. And the internal DOS is just an abstracted interface to fit with the kernal API (I think).

Edited by Scott Robison
Link to comment
Share on other sites

On 12/3/2021 at 3:37 PM, kelli217 said:

There's a limit to how sophisticated these things can be, given speed and size constraints of retro or retro-like systems.

There's limit to anything on every platform retro or not. The balancing of protections and limitations is an art in my opinion. I'm risk adverse and want to be able to choose a safe path first over convenience. That's me 🙂

The retro factor I think shouldn't be a limitation on a safe approach. I've never crashed my VIC 20, but I spent a career helping people address crashes on PCs. Part may be the complexity of the technology, but some of it I feel is mindset that detaches actions from consequences. The technology has progressed over the years, the mindset should evolve too. 

The people in the X16 community seems to pride itself on providing sound technology and ingenious solutions. While no solution is perfect, coming up with a workable solution that protects the user in most cases is desirable. I see that Zerobyte has put some protection into the code, which I appreciate.

 

Link to comment
Share on other sites

@desertfish found another bug where a full-page copy would misdirect byte 0 when destination address is not page aligned and bank wrap occurs. His test bed caught this on loads to $A002, but it would have happened on any non-page-aligned block copy of 256 bytes. Fortunately I was able to save some other bytes and this patch only added a few more net bytes in size.

I’ve patched this and will update the attached image shortly.

@Edmond D I do want to make code that behaves as advertised. A LOAD that allows you to bulldoze the stack is one thing, but a low-level routine that just glitches in certain cases is much less acceptable. In the former case, you can at least look at the file size beforehand and throw an error before calling LOAD. But the latter is something you'd have no control over from the outside.

And as @Scott Robison points out, there's no way to pre-check down to the last byte whether a read into $9E01 or not, but there is a possible way to do it. The macptr routine accepts .A=num_bytes. If .A=0 then the device decides how many. fat32_read limits to 256 bytes. I would have to go re-read the code with a non-zero amount in mind to see what would happen if an arbitrary amount reached fat32_read, but in theory, LOAD could see that it's in page 9E and then request only enough bytes to fill the page. If the return value is "still loading" then throw error.

I agree that this might make more sense with a "safe load" frontend than to encumber the primary load function with this extra overhead.

The changes to fat32.s were necessary for bank ram loading, even if I'd implemented a different HLOAD command analogous to VLOAD, but for HI-RAM loading.... the block copy was borked and needs to work that way regardless of the front-end packaging.

Edited by ZeroByte
  • Like 1
Link to comment
Share on other sites

On 12/5/2021 at 12:26 PM, ZeroByte said:

@Edmond D I do want to make code that behaves as advertised. A LOAD that allows you to bulldoze the stack is one thing, but a low-level routine that just glitches in certain cases is much less acceptable. In the former case, you can at least look at the file size beforehand and throw an error before calling LOAD. But the latter is something you'd have no control over from the outside.

I appreciate that you're improving the system by dealing with the issues that are controllable. Having hidden issues that only occur in certain cases can be problematic. Your later case is certainly understandable and not preventable. 

While not ideal, the documentation around using this load should explain the potential risk that might be encountered in using the command. You've mentioned the documentation is out of date, so if the pull goes through hopefully updating the documentation does as well. That suggest to me that you're doing a complete job in terms of code and documentation, something that I (and hopefully others) fully appreciate. Thanks!

On a side note, I do fully understand the X16 is not an industrial control system platform. Industrial and embedded control space does require certain knowledge, understanding and experience which most people in hobby world don't have. I feel that the PI/Ardunio platforms make real world control systems more accessible, but with more risk due to the lack of the user working in that space. I'd hate to hear of someone hurting themselves (or an X16) because of that gap. Lack of shipping hardware prevents incidents at this time. 

Finally, I'm still on R38. I do hope that R39 (or better) comes out while the team works on the hardware issues. I'm not blocked from enjoying what is available, but I believe that a updated version would serve the community. 

  • Like 1
Link to comment
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

 Share

×
×
  • Create New...

Important Information

Please review our Terms of Use