Jump to content
  • 1

"Big" data files


lamb-duh
 Share

Question

I have 1MiB of data I'd like accessible to a program. Obviously, this can't all be loaded into memory at one time, I'd like to be able to load it into ram in chunks of ~1KiB. Has anyone done something like this before?

As far as I can tell, my options are to load an entire file into ram using LOAD, or I can read the file byte by byte using CHRIN. Doing a seek and then a bunch of CHRIN should work, but there doesn't seem to be a seek command (at least for commodore), and using CHRIN to seek is not going to be fast enough (or, at least I assume it wouldn't be).

One strategy I've come up with is to split the file up into 256 chunks of 4KiB, then load files individually. But this is suboptimal for two reason--- first there's a big mess of data files, and second each file needs to contain 4KiB of data, but LOAD needs an extra 2 byte header which will effectively double the disk space needed on a filesystem with 4K blocks. (it's also using bigger blocks than I wanted to use, but that's not really a big deal).

Is there something better I don't know about?

Edited by lamb-duh
Link to comment
Share on other sites

Recommended Posts

  • 0

You are better off splitting up the file into individual assets that you can load independently. You can use filenames with hex numbers to make it easier to select them. Megabyte files are really just not a thing in the 8-bit world, unless you are doing linear tape data processing, which is a super fun thing to do in your free time.

  • Like 1
Link to comment
Share on other sites

  • 0

Note that loading into High RAM allows spilling over from one segment to the next, so you could break the file up into 32KB chunks and load 4 High RAM segments at a turn. Then with 4KB sectors each file only carries a 25% penalty for the leading $A000.

I expect to be storing 128KB block files in chunks of 8KB, with the index in hexadecimal ... .B00 to .BFF (the last block chunk ends up being best friends forever) ... 8KB is a single High RAM segment, but it's more that I want cross-compatibility with the C64, so value an ability to load the set of blocks containing the target block in the shadow of the Basic ROM.

  • Like 2
Link to comment
Share on other sites

  • 0
1 hour ago, SlithyMatt said:

@BruceMcF I have tested loading 190k files into banked RAM and it worked beautifully.

Yes. I mention 32KB chunks because they are a handy size from a pointer perspective. You know that your native "within this block" pointer has its high bit set, so "elsewhere in this chunk" can be with the high bit clear, and then there are four free bits available to say which segment it is in the chunk, so that's four segments. A "BPL NEARBY" is the only operating time overhead for working with in-segment, cross segment in the same chunk you are like:

NEARBY: STA Z : TAX : LDA CHUNK,X : ORA CURR_CHUNK : STA $0 : LDA Z : AND #$1F : ORA #$A0 : BRA -
Where CHUNK is a table that gives b5 & b6 in b0 and b1 with all other bits clear, and CURR_CHUNK is the segment index of segment 0 of the chunk.

Then for a linked list your "FAR" pointer is b6 set, so a high byte address in the $C000-$FFFF range. If cross-chunk landing points are always in segment 0, for extended pointers you have (with the high byte just loaded into the accumulator and when working with multi-chunk pointers ... if you know it's a local pointer, you skip it):
BPL NEARBY : BIT #$40 : BNE FARAWAY : ...


FARAWAY doesn't have to work with raw chunk addresses, it can be an index into a table, so you can have lazy loading, "least recently used" chunk over-written, etc. "ASL : ASL" and you have an index into a table with four bytes of data on the status of each chunk. That gives you an ability to conveniently manage up to 2MB of data  with a table never bigger than a page ... whether or not you have 512KB or 2MB in the current system.

I had a lot of experience with LRU buffering back in the 80s working with a fig-Forth that had block files as REL files, and even without any fastload, with the glacially slow 1541, loading 1KB at a time when the LRU kicked in was a lot more responsive than loading a whole file. So with 8MHz and data on SD card, I expect lazy loading data into a set of 32KB LRU data buffers in High RAM would be really snappy.

Edited by BruceMcF
  • Like 1
Link to comment
Share on other sites

  • 0
On 7/16/2020 at 10:25 AM, SlithyMatt said:

You are better off splitting up the file into individual assets that you can load independently. You can use filenames with hex numbers to make it easier to select them. Megabyte files are really just not a thing in the 8-bit world, unless you are doing linear tape data processing, which is a super fun thing to do in your free time.

well to be fair, SD card controllers are also a little out of place in the 8 bit world. it seems a shame to have so much storage capacity attached, and not being able to make full use of it. actually, a virtual tape drive that stores data on sd would be awesome.

do you use a lookup table to map 0-16 to petscii? I have files with hex names right now, but haven't written the loader for them yet. thinking about it now, it might be a good idea to use letters A-P instead, then I can map a hex digit, i -> 'A'+i

15 hours ago, BruceMcF said:

Yes. I mention 32KB chunks because they are a handy size from a pointer perspective. You know that your native "within this block" pointer has its high bit set, so "elsewhere in this chunk" can be with the high bit clear, and then there are four free bits available to say which segment it is in the chunk, so that's four segments. A "BPL NEARBY" is the only operating time overhead for working with in-segment, cross segment in the same chunk you are like:

This is all great information, thank you! all I need right now is to resolve a 16 bit pointer to a chunk of data (just a really big array), but I'm definitely going to come back to this when I need more structure.

I'm sticking with 4KiB+2 chunks right now because it ended up being just exactly the right size for me to work with easily (I'm not thrilled with the filesystem overhead, but whatever it's going on a 32GB sd card ffs). Each record is 16 bytes, which works out to 256 pages of 256 records each, so I don't need to do any bit manipulation to split it into a page number and index. and then because I only have two slots in each ram bank, I can lsr the slot number to set the ram bank, then use the carry flag to write one of two base addresses into zeropage.

Link to comment
Share on other sites

  • 0

Okay, I have this mostly working except for one issue I just can't figure out. I'm splitting each RAM bank into 2 4K sections (skipping bank 0). These slots are numbered consecutively such that even slots are at $A000 and odd slots are at $B000. And everything's working in the even slots, but when I try to read data out of odd slots I just get zeros.

I'm debugging by using `jmp *` to stop execution at particular points, then inspecting memory dumps. I have checked that the correct memory bank is selected and that I have the right address. I even tried using an absolute address (because I'm copying data with an indirect index and I'm still hesistent with that). The last thing my program executes before spinning is to copy $bcc7 into zero page. I definitely have the correct page selected, and a zero gets copied across. However, when I look at the vram dump, address $3cc7 is not a zero---the vram dump shows that the data I expect to be there is there.

 

Any ideas what might be going wrong?

e; I mean banked ram, not vram

Edited by lamb-duh
Link to comment
Share on other sites

  • 0
1 hour ago, StephenHorn said:

Wait, when and why did we start looking at VRAM? Are you loading files into himem just so you can copy them into the VERA?

oops, I mean banked ram, not vram.

(I am copying into vram though, the 4K chunks are being loaded into banked memory, and then tiles are copied out of there into vram individually as they are needed. so data in even slots is successfully being copied/data in odd slots is not. I've worked the issue down to it won't read the data out of *banked* ram.)

Edited by lamb-duh
Link to comment
Share on other sites

  • 0
10 minutes ago, StephenHorn said:

The other thing, I'm not sure if this was a copy-paste error or typo or what, is that you refer to $bcc7 as the source you're copying from, and $3cc7 as the address you're comparing with. These are not the same address, $3cc7 isn't even in himem...

the first address is where I'm reading from on the x16, the second address is what I'm checking with on the banked RAM dump. They *are not* the same, but it's very possible that I calculated the wrong offset in the file. This is RAM bank 1, which should span $2000-$4000 (in the dump file), so $3cc7 corresponds to $bcc7 (on the x16). When I look at the dump file, this range is full of the data that I loaded into it.

Link to comment
Share on other sites

  • 0
3 minutes ago, lamb-duh said:

the first address is where I'm reading from on the x16, the second address is what I'm checking with on the banked RAM dump. They *are not* the same, but it's very possible that I calculated the wrong offset in the file. This is RAM bank 1, which should span $2000-$4000 (in the dump file), so $3cc7 corresponds to $bcc7 (on the x16). When I look at the dump file, this range is full of the data that I loaded into it.

 

Ah, now I'm starting to understand. Please understand that I prefer to look at memory dumps with the built-in debugger (`-debug` command line option, then F12 to pause and show the emulator, F5 to resume execution), so to me an address like "$01:A000" would normally map to "$01:A000". Also, the built-in debugger would be a great tool for determining whether your code is actually doing what you think it is: instead of constantly rebuilding with `jmp *` instructions, set a breakpoint in the debugger by going to the address in code (`d 080d` in the debugger's command line, for example), and then pressing F9 to set the breakpoint. When the CPU tries to execute the instruction at the breakpoint, it'll stop instead and pull up the built-in debugger. You can inspect the processor state, dump portions of memory (`m a000`), look at different RAM banks ("+" and "-" on the keyboard)... there's all kinds of useful tools there.

 

Link to comment
Share on other sites

  • 0

Hm, the debugger is showing that the data is where I think it is, but I still can't get at it from my program. Here's the last thing that's run (and I verified in the debugger that it stopped here):


    lda #1
    sta 0
    lda $bcc7
    sta r0
    stz r0+1
    sei
    stx $ff
    jmp *

So, set ram bank to #1, then copy $bcc7 to $02. If I look at the zero page in the debugger, $02=0, but if I `m 01bcc7`, I can see the data that I expected to be there.

 

By the way, is it possible to interact with the debugger from the command line? the interface has some accessibility issues.

Link to comment
Share on other sites

  • 0
3 hours ago, lamb-duh said:

    lda #1
    sta 0
    lda $bcc7
    sta r0
    stz r0+1
    sei
    stx $ff
    jmp *

So, set ram bank to #1, then copy $bcc7 to $02. If I look at the zero page in the debugger, $02=0, but if I `m 01bcc7`, I can see the data that I expected to be there.

Are you running with the R37 emulator? Using the zero page to set the banks is new to the emulator and requires you to build from source. Otherwise you need to write to the legacy RAM bank register: $9F61.

Link to comment
Share on other sites

  • 0
59 minutes ago, SlithyMatt said:

Are you running with the R37 emulator? Using the zero page to set the banks is new to the emulator and requires you to build from source. Otherwise you need to write to the legacy RAM bank register: $9F61.

The debugger is showing me that the data is correctly loaded into bank 1, which was selected using the same zero page variables

edit: I am running r37 though

Edited by lamb-duh
Link to comment
Share on other sites

  • 0
4 minutes ago, lamb-duh said:

The debugger is showing me that the data is correctly loaded into bank 1, which was selected using the same zero page variables

edit: I am running r37 though

If you are really running R37, bank 1 is not getting selected by storing 1 to $00. You need to use $9F61, so either something else is setting $9F61, you are reading the debugger incorrectly, or you are running a build from a newer baseline than R37.

Link to comment
Share on other sites

  • 0
1 hour ago, SlithyMatt said:

If you are really running R37, bank 1 is not getting selected by storing 1 to $00. You need to use $9F61, so either something else is setting $9F61, you are reading the debugger incorrectly, or you are running a build from a newer baseline than R37.

sorry, I shouldn't be so dismissive of that. You're right and definitely just saved me a lot of debugging over an issue that I wouldn't have realized was there until this one was solved. This issue is all happening within the same ram bank.

I've reproduced this in a much smaller program. You can assemble it with 64tass, or there's a binary in there too. when the program gets to the end, the accumulator should have a non-zero value read from the file that was just loaded into memory, instead it is zero. However if I `m 01b010` into the debugger, it shows me the data I expect to be there (although I won't discount the possibility that I'm reading the debugger all wrong). If you change the address to $a000 instead of $b000, the accumulator has the expected value.

load.zip

Edited by lamb-duh
Link to comment
Share on other sites

  • 0

@lamb-duh Not sure if I got this right: so in the end you now have split up the file into smaller files - and then load those?

I have a similar task ahead and I would personally prefer to not split the files. I would end up with 100s of files and that might be too much for the Kernal (too many files in a directory), right?

I tried quite a lot, but OPEN always returns 5 - 'device not ready' (I'm using cc65), so  I cannot use GETIN or CHRIN.

Am I right that the sequence should be:

SETLFS, STNAM, OPEN, CHKIN and then a lot of GETIN ?

Link to comment
Share on other sites

  • 0

Yeah, I broke the big file up into chunks that are loaded in full. It runs well in the emulator, but I have no idea how it's going to be when it has to also locate those files in a fat filesystem. I'm not terribly satisfied with the solution, but it's working now.

it seems that OPEN/GETIN doesn't work yet in the emulator, you have to use LOAD.

The emulator comes with symbol definitions for CBDOS. I'm not sure where to go for documentation, but it seems to have more robust filesystem routines, including `fseek`. I'm not sure if it's a useable state though, since I haven't come across any documentation for it.

edit: I don't think fat has any limits on how many files can be in a directory except for its file size limitation (which are way bigger than any directory listing). However, directory listings are just an array of filenames and pointers so actually locating a file in a large directory becomes a problem. Using short filenames should mitigate this, somewhat.

Edited by lamb-duh
  • Thanks 1
Link to comment
Share on other sites

  • 0

I think this depends on how SD card and host file system access are implemented in the end. I just know that for example CP/M and also MS-DOS 1.0 (2.0 too?) had a limitation of 64 files in any directory (1 disk sector). I think the C64 Kernal must have similar limitations, but I don't know for sure.

FAT-12 also had a limitation of max. 4095 files (number of FAT entries on disk). One of my upcoming projects might require at least 4MByte of bitmap (20 times 8bpp 320x240 and sprite data) and music data (20 tunes 100k each). That would be over 500 files - if I have to split them up, that's not ideal. I'd rather have around 40-50 100kByte files.

Link to comment
Share on other sites

  • 0
Quote

actually, a virtual tape drive that stores data on sd would be awesome.

That sounds kind of fun.  I suppose a driver program would need

(1) A buffer to store the name of the T64 file it's accessing from the SD.

(2) A destination pointer.

(3) An EOF flag.

(4) A block-offset into said T64 file.  If that's an unsigned int, then you could access up to 4mb.

(5) Routines to "mount and open" the volume (maybe), scan the next filename, read a block into the destination address, skip the file.

(6) Similarly, write operations.

(7) An "Experience API" that prints the tape "directory" to the screen.  Actually that would also be nice for debugging and development...

#4 and #5 could be gathered into a nice little jump table as well.

 

 

Edited by rje
Link to comment
Share on other sites

  • 0

I've just found the same limitation with CC65 lacking fseek() for the X16.

I thought that it might be really nice to have one absolutely gigantic file, and just fseek() around to get at its data.  I mean, that solves lots of problems for me.

Instead, I guess I'll have to have a zillion little files.  Which means I guess I'll need to store them in a lot of subdirectories.

 

Link to comment
Share on other sites

  • 0
50 minutes ago, rje said:

I've just found the same limitation with CC65 lacking fseek() for the X16.

I thought that it might be really nice to have one absolutely gigantic file, and just fseek() around to get at its data.  I mean, that solves lots of problems for me.

Instead, I guess I'll have to have a zillion little files.  Which means I guess I'll need to store them in a lot of subdirectories.

The reason there isn't a generic seek in CBM DOS is because of the way files are built. Seeking to a given offset in a file is an O(N) operation, because you have to get the first block, use it to find the track and sector of the second block, and lather rinse repeat until you get to block N where you want to read some data.

CBM DOS did include relative files which did support a record number based seek for up to 254 byte records (if I remember limits correctly).

FAT based file systems have a global file allocation table so that one can much more quickly follow the chain of clusters (assuming a relatively fragmentation free file system image and enough memory to hold the entire FAT). This is much easier said than done though because the FAT per Microsoft must include a minimum 65527 clusters to qualify as FAT32. Given that each entry is 32 bits, that means that we're looking at about 256 KB. A highly fragmented file could be almost as bad to seek in from the perspective of an X16.

Relative files had the benefit of being processed on the drive. The C64 only had to say "give me record X" and by magic the drive satisfied the request. FAT32 really isn't designed with a 6502 in mind.

The potentially good news is that there is code in ROM for a fat32_seek operation: https://github.com/commanderx16/x16-rom/blob/a200d6266038fc5ff506280e70383e5774bd0ac9/dos/fat32/fat32.s ... this should make it possible for one to seek at some point, even if it isn't implemented in the cc64 library today.

Link to comment
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Answer this question...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

 Share

×
×
  • Create New...

Important Information

Please review our Terms of Use