Jump to content
  • 0
Sign in to follow this  
Roman K

65c02 addressing modes usage

Question

I was playing a little with x86 assembly when I was a kid and now SUDDENLY decided to try some retrocomputing. The X16 was a pretty obvious reason, so I started learning the 6502 assembly. And that's HARD though I'm having a lot of fun.

The problem is that lots of books and tutorials answer the question 'WHAT' but never 'WHY'. Like they patiently explain how to work with different addressing modes but never when and why to use them. Book after book, tutorial after tutorial - everywhere I get the same info that is almost useless without the practical examples, not just "how to address that stuff".

I'd wanted to share my understanding of the potential usage of different addressing modes and ask for some clarification or better ideas if possible.

As we know, there are 16 addressing modes in total, 3 of them were added in 65c02.


1 - 3: Immediate, Accumulator and Implied modes are not technically 'addressing modes' as they have nothing to do with getting the memory location.

JMP and JSR instructions and their modes are somewhat special. Actually, JMP means Load PC counter (and JSR is the same, but with additional saving of the return address to stack). So technically what we call the Absolute mode for the JMP instruction is Immediate (take the immediate 2-BYTE value and put it to the PC register), what we call Indirect mode is Absolute for JMP (take the address that is stored on that location and put it to PC) and the same stuff for Absolute indirect addressing for JMP added in 65c02, it's just Absolute, X mode (take the address with the offset and put it to the PC register). That's not a critical addition, but quite important for me. And I've seen it only once. Though I'm not going to make exceptions for these instructions further.


So. 
4. Relative mode. Used only with different branch instructions so can be used for organizing conditional or unconditional (BRA) short jumps and loops (happens often, but no other uses). 
In case the loop body is really big and 127 bytes of the offset is not enough can be combined with the long jump (JMP abs).


5. ZP relative mode is much less popular. Operates with variables in Zero page and used only by BBR/BBS instructions for the same purpose - short or long jumps but now depending on concrete bit values. 


6. Absolute mode. The most important one as it's supported by the vast majority of commands. If compared with high level languages, using absolute mode is the same thing as defining simple variables and performing actions on them.

Quote

someVar:
.byte $FE
inc someVar
ora someVar
lda someVar
adc #20
sta someVar


and so on.
Assembler turns labels into absolute values, based on .org definitions. The bad part of such variable declarations - it's static and it increases the code size. We mix code and data and we cannot move beyond the user area ($9EFF).
By using banking we can have several instances of complex structures (up to 256) with instant switching between them and that gives us lots of opportunities. But that requires a kind of 'manual' memory management unlike the definition via labels. On the other hand, memory is not wasted, program size is not increased, going beyond the user area is possible.
JMP makes far jumps, JSR calls subroutines, that's ok.


7. Zero page mode. Mostly the same as absolute mode, but 1 cycle less in speed, 1 byte less in memory. Half of ZP is occupied by the system already, so we have 126 bytes only with 32 bytes given to virtual registers, so even 94 bytes remains.
As CPU works at much higher speed and X16 has much more memory, ZP mode is not so usable now. Probably, we can use virtual registers with all supported commands but all other ZP space should be used for indirect modes with values just loaded and read via move commands.

Just recently I read about Sweet 16, need to investigate more. Just curious if it can share the same space that is used by virtual registers if I want to use it or I need to free some extra ZP memory for it.


8. Absolute,X mode allows working with arrays or strings and working in loops with help of the X register. 
A little bit less of available instructions. More problems if we want to use labels - need to allocate more memory that will increase the program size, can go beyond the limit and overwrite the code in a loop.
Working with zero-end strings is even worse - you never know where it ends. And the X register is only 8 bit wide.


9. Absolute,Y mode is similar, but provides even less instructions. Can't invent much use of it. Maybe working in the same loop with moving from the end to the beginning, while X moves from the beginning to the end. Or processing the array with some dynamically adjusted offset.


I don't know a good example of usage of 10. ZP, X (and tiny 11. ZP, Y) modes due to the small size of Zero Page. Like, if it's about arrays why not use absolute addressing instead? I'll spend more time by moving the data to/from ZP.


12. Indirect mode is used only by JMP instruction. And literally everywhere they tell you: Indirect JMP bug and that’s all 🙂
The benefit of this mode is that you can calculate the address during the execution. Though that’s achievable via self modified code:

Quote

LDA #addrhi
STA jump
LDA #addrhi
STA jump+1
jump:    JMP #initial_address  // self-modified

So the good use is calling some OS routines via indirect calls with possible replacements or additions (like custom handlers). The problem is that there is no indirect JSR.

13. Indirect, X - simplifies handling of jump tables or something like complex switch instructions. 
14. (ZP), Y indirect mode looks very promising when it comes to arrays or other blocks of memory that need to be dynamically created. Something like copying memory blocks from one destination to another. 

And two remaining addressing modes are quite unclear to me. 

15. (ZP) indirect introduced in 65c02 is a complement to indirect mode used by JMP instruction (with limitation to ZP only), so we have that orthogonality Absolute addressing (including jump) - Indirect addressing (including jump) but due to ZP size limitations I can hardly imagine the good usage for it.

And finally the most complicated 16. (ZP, X) mode that looks totally unusable.
 

Sorry if that looks too verbose or primitive to you, but I'm trying to be systematic in my understanding of the CPU and assembly.

Share this post


Link to post
Share on other sites

16 answers to this question

Recommended Posts

  • 0
1 hour ago, Roman K said:

I was playing a little with x86 assembly when I was a kid and now SUDDENLY decided to try some retrocomputing. The X16 was a pretty obvious reason, so I started learning the 6502 assembly. And that's HARD though I'm having a lot of fun.

The problem is that lots of books and tutorials answer the question 'WHAT' but never 'WHY'. Like they patiently explain how to work with different addressing modes but never when and why to use them. Book after book, tutorial after tutorial - everywhere I get the same info that is almost useless without the practical examples, not just "how to address that stuff".

1 - 3: Immediate, Accumulator and Implied modes are not technically 'addressing modes' as they have nothing to do with getting the memory location.

JMP and JSR instructions and their modes are somewhat special. Actually, JMP means Load PC counter (and JSR is the same, but with additional saving of the return address to stack). So technically what we call the Absolute mode for the JMP instruction is Immediate (take the immediate 2-BYTE value and put it to the PC register), what we call Indirect mode is Absolute for JMP (take the address that is stored on that location and put it to PC) and the same stuff for Absolute indirect addressing for JMP added in 65c02, it's just Absolute, X mode (take the address with the offset and put it to the PC register). That's not a critical addition, but quite important for me. And I've seen it only once. Though I'm not going to make exceptions for these instructions further.

It's interesting to me that you would write this, as I've recently been thinking about alternative syntax for an assembler, something that was very assignment driven. For example, instead of "JMP ADDR" it might use "PC = ADDR" or instead of "LDA ADDR" it might use "A = [ADDR]". But that's not important, just thoughts that inform my reading of your comments.

I think of the CPU as having the following registers (PC [PCL and PCH], S, P, A, X, Y, MA [MAL and MAH]). The MA (memory address) "register" isn't in any documentation I've ever read, and it's not something we can directly access like we can with PC via JMP. But it is a part of any memory access.

As for the difference in absolute JMP vs other absolute addressing instructions, I imagine it to work like this:

"JMP ADDR" is the same as "PC = ADDR"

"LDA ADDR" would work out to something like "MA = ADDR" followed by "A = [MA]"

If we combine the two ideas for JMP indirect, it takes "JMP (ADDR)" and conceptually works out to "MA = ADDR" "PCL = [MA+0]" "PCH = [MA+1]".

Reading about how the 6502 processes instructions step by step through each cycle, you can sort of see this working out. http://nesdev.com/6502_cpu.txt has some descriptions of the cycle by cycle processing. Using its description of LDA:

Cycle 1: fetch opcode byte: "OPCODE = [PC++]"
Cycle 2: fetch low byte of effective address: "EAL = [PC++]"
Cycle 3: fetch high byte of effective address: "EAH = [PC++]"
Cycle 4: read from effective address: "A = [EA]"

If we look at JMP through a similar lens:

Cycle 1: fetch opcode byte: "OPCODE = [PC++]"
Cycle 2: fetch low byte of effective address: "EAL = [PC++]"
Cycle 3: fetch high byte of effective address: "EAH = [PC++]"
Phantom cycle: set PC: "PC = EA"

The reality is undoubtedly more complex, this is just how I visualize it.

  • Like 1

Share this post


Link to post
Share on other sites
  • 0
2 hours ago, Roman K said:

Assembler turns labels into absolute values, based on .org definitions. The bad part of such variable declarations - it's static and it increases the code size. We mix code and data and we cannot move beyond the user area ($9EFF).

The ca65 assembler provides solutions to several of these issues. It provides code and data segments that are stored separately in memory. It also provides a segment for uninitialized values, which is not saved as part of the program.

2 hours ago, Roman K said:

As CPU works at much higher speed and X16 has much more memory, ZP mode is not so usable now. Probably, we can use virtual registers with all supported commands but all other ZP space should be used for indirect modes with values just loaded and read via move commands.

The Commander X16 Programmer's Reference Guide contains documentation on how the 16-bit virtual registers are used. I believe that the best way to use zero page is to use the virtual registers for temporary values used only for indirection, and to store permanent variables in the rest of zero page. Even at 8 MHz, the cycles saved by using zero page variables do add up, and when performing time-sensitive operations such as servicing raster interrupts, zero page is very useful. As every instruction that provides absolute addressing also provides zero page addressing (except JMP and JSR), it's a good idea to use as much of zero page as possible.

2 hours ago, Roman K said:

Working with zero-end strings is even worse - you never know where it ends.

A simple way to handle zero-terminated strings is to use a BEQ instruction immediately after loading the next byte of the string, allowing the program to break out of the loop.

2 hours ago, Roman K said:

Indirect mode is used only by JMP instruction. And literally everywhere they tell you: Indirect JMP bug and that’s all 🙂

I believe the 65C02 corrected this bug.

2 hours ago, Roman K said:

And finally the most complicated 16. (ZP, X) mode that looks totally unusable.

I agree that this addressing mode has very few uses. Given that the 6502 removed almost any feature that made the CPU too complex, I would assume that this addressing mode was included because it was very simple to implement, not because it was actually useful.

  • Like 1
  • Thanks 1

Share this post


Link to post
Share on other sites
  • 0
1 hour ago, Elektron72 said:

I agree that this addressing mode has very few uses. Given that the 6502 removed almost any feature that made the CPU too complex, I would assume that this addressing mode was included because it was very simple to implement, not because it was actually useful.

It is useful if you want to use zero page as a stack of indirect pointers to memory. It wasn't as useful on Commodore models because they seemed to randomly use zero page for the kernal and basic, but given the care that has been taken to keep as much of zero page available for the user as possible, it could be useful there. I've not done it myself, but I've seen / read coverage of the technique. A stack based language that stored pointers to operands on the software stack could be manipulated with simple inx inx / dex dex pairs, then values loaded indirectly so say you had a stack based language that needed to add two bytes (to create a silly example):

ldx #0
lda <#addr1
sta $02,x

inx
lda >#addr1
sta $02,x

inx
lda <#addr2
sta $02,x

inx
lda >#addr2
sta $02,x

dex
lda ($02,x)

dex
dex
adc ($02,x)

Note: not assembled or tested but I think it is basically correct.

Now, of course you'd never do this in straight assembly language. You'd use absolute addressing. But what if you don't know in advance what the addresses are? Perhaps they are pointer variables in a high level language and there is more to getting the values than just lda #value (like pointer math). Or you could use self modifying code to create a much more compact representation. Unless your code is in ROM.

It's funny ... as a teen I had such a hard time with the concept of indexed indirect and indirect indexed. Nothing like almost 40 years of writing software in a number of languages including x86 assembly & C & C++ to have a better grasp of how some of these addressing modes would map in useful ways when one isn't writing assembly language directly.

The same thing was true of multiplication & division of bytes (which we don't get for free in 6502) or multiple precision math... for me anyway, it is one of those things that is hard to grasp until one is forced into doing it (I needed to write 64 bit math operations for 16 bit DOS for a scripting language I wrote for my employer when working on PCBoard back in the day).

  • Like 2
  • Thanks 1

Share this post


Link to post
Share on other sites
  • 0
9 hours ago, Scott Robison said:

I think of the CPU as having the following registers (PC [PCL and PCH], S, P, A, X, Y, MA [MAL and MAH]). The MA (memory address) "register" isn't in any documentation I've ever read, and it's not something we can directly access like we can with PC via JMP. But it is a part of any memory access.

That can probably lead to some better abstraction instead of existing one, that can be reused in some other mid-level language. I'm still searching for one.

 

9 hours ago, Elektron72 said:

The Commander X16 Programmer's Reference Guide contains documentation on how the 16-bit virtual registers are used.

I found the usage in only two or so calls. That's quite inconsistent. Like, I cannot rely on that space as being register extension, as they can be changed by some Kernal calls. But saving of a 'register page' is somewhat expensive. Especially if we want to perform it time to time. That's why I am thinking about possibility to switch.. may be not the whole Zero Page but only the block of 16 or even 32 registers in zero page via the similar mechanism we implement banking. With a kind of sweet 16 VM on top of it, probably even extended to be sweet 32. But that's a completely different machine architecture. 🙂

Share this post


Link to post
Share on other sites
  • 0

I think you've got the hang of the most important stuff already! Well done 😉 I'm just going to comment on a few things from my experience with 65c02.

12 hours ago, Roman K said:

8. Absolute,X mode allows working with arrays or strings and working in loops with help of the X register. 
A little bit less of available instructions. More problems if we want to use labels - need to allocate more memory that will increase the program size, can go beyond the limit and overwrite the code in a loop.
Working with zero-end strings is even worse - you never know where it ends. And the X register is only 8 bit wide.


9. Absolute,Y mode is similar, but provides even less instructions. Can't invent much use of it. Maybe working in the same loop with moving from the end to the beginning, while X moves from the beginning to the end. Or processing the array with some dynamically adjusted offset.

I am using these two modes in conjunction a lot, because it happens that I need to manage data from two different arrays in the same loop. So I am storing one array index in X and the other in Y, LDAind and STAing from/to both arrays, doing the rest of the logic and calculations using the accumulator. You won't need this for Hello World, but as soon as more complex problems are tackled, I think this will become very useful.

 

12 hours ago, Roman K said:

14. (ZP), Y indirect mode looks very promising when it comes to arrays or other blocks of memory that need to be dynamically created. Something like copying memory blocks from one destination to another. 

Indeed very useful. Any code that needs to be able to access data structures that can be in different places in memory will benefit from this command. This does include dynamically allocated data structures, but I personally have used it mainly for accessing multiple "hard coded data structures" defined by .byte commands.

 

12 hours ago, Roman K said:

13. Indirect, X - simplifies handling of jump tables or something like complex switch instructions. 

Yep. I found this being quite useful, as well. In order to make the most out of this, you need an efficient way to determine X, e.g. to simply load it from a variable. Otherwise, you would be better off using something like an "if-else-chain", (of course, that means using a bunch of conditional branches in 65c02 assembly).

Edited by kliepatsch
  • Like 1

Share this post


Link to post
Share on other sites
  • 0
3 hours ago, Roman K said:

I found the usage in only two or so calls. That's quite inconsistent. Like, I cannot rely on that space as being register extension, as they can be changed by some Kernal calls.

The documentation states that r6-r10 are saved no matter which kernal routine is called, so those are safe to use. r0-r5 are only changed if the kernal routine being called uses them as return values, so these can be used depending on which kernal calls are executed. Finally, while r11-r15 are officially designated as scratch registers, they will only be overwritten if a kernal call is made, so they are very useful for temporary values.

  • Like 1

Share this post


Link to post
Share on other sites
  • 0

(ZP,X) mode = array of data stored in ZP, get item X.
(ZP),Y mode = ZP pointer to a block of data in memory, get byte Y.

(ZP),Y is the classic way to have a routine that iterates through some arbitrary block of memory. E.G.: A music player has several pattern data blocks in memory, and whenever a tune is using pattern 14, it loads the base address of pattern 14 into ZP and the current offset into the pattern is loaded into Y. Then as the algorithm iterates through the voices, it moves the current pattern / index into ZP and Y. (This is what Rob Hubbard's Monty On The Run player routine does).
(ZP),Y is one of the slower modes though. You can gain a little speed by using self-mod code techniques to use ABS,X instead. I.e. you overwrite the ABS address in your code with the base address of the pattern data, instead of copying it into a ZP pointer. This is typically 1 or 2 cycles faster, and for a music player, you want that to be as fast as possible to save CPU for all those flashy graphic tricks instead.

(ZP,X) would be useful to a SID player to store the frame delays in ZP. Since a voice not playing a sound should result in a NOP as quickly as possible (not a litteral NOP CPU instruction, but a NOP for the algorithm), having the frame delay timers in ZP is quite fast. Use X to cycle through the voices, and have BEQ PlayVoice, else INX then loop. You can store all voices right there in ZP (if you have the space in ZP to play with).

Let's not forget that if you really really want to get the most out of the HW, your program can go "bare metal" mode and not use the Kernal for anything. If that's the case, then 100% of ZP is yours to play with as you see fit, but now you have to bit-bang the joystick ports yourself. 😉

  • Like 2
  • Thanks 1

Share this post


Link to post
Share on other sites
  • 0
On 5/3/2021 at 5:45 PM, ZeroByte said:

Let's not forget that if you really really want to get the most out of the HW, your program can go "bare metal" mode and not use the Kernal for anything.

You mean some modes are less usable by developer due to CX16 architecture solutions and memory map? So I should just avoid some of them?

Share this post


Link to post
Share on other sites
  • 0
19 minutes ago, Roman K said:

You mean some modes are less usable by developer due to CX16 architecture solutions and memory map? So I should just avoid some of them?

I believe that ZeroByte is saying that as the kernal uses various system resources (e.g. half of zero page, various other pages in memory, and processing time due to the default IRQ handler), you may be able to push the system further by disabling it entirely. However, given the amount of resources available on the X16, disabling the kernal is likely not worth the additional programming difficulty in most situations.

  • Like 1

Share this post


Link to post
Share on other sites
  • 0
6 hours ago, Elektron72 said:

I believe that ZeroByte is saying that as the kernal uses various system resources (e.g. half of zero page, various other pages in memory, and processing time due to the default IRQ handler), you may be able to push the system further by disabling it entirely. However, given the amount of resources available on the X16, disabling the kernal is likely not worth the additional programming difficulty in most situations.

I concur with that assessment.

The reality is that any general purpose interface will be suboptimal for some or maybe most problems. You can almost always come up with a "better" interface for some value of "better". However, that "better" interface will itself be worse for yet other classes of problems than the general purpose interface.

So yes, by disabling the kernal, writing your own IRQ / NMI / BRK handlers, and bit banging all your own I/O, you can potentially come up with a solution that is better than the kernal would be. But it would mean a lot more time and effort. Sometimes the time and effort is worth it, which is why we wound up with so many fast load schemes for the C64 that avoided using the kernal for loading, choosing instead to implement their own interfaces.

Edited by Scott Robison
  • Like 1

Share this post


Link to post
Share on other sites
  • 0

Not to bang the drum too much on the KERNAL and I/O, but I want to use this diversion to quote this bit from Wikipedia of all places:

Quote

Surprisingly, the KERNAL implemented a device-independent I/O API not entirely dissimilar from that of Unix or Plan-9, which nobody actually exploited, as far as is publicly known. Whereas one could reasonably argue that "everything is a file" in these latter systems, others could easily claim that "everything is a GPIB-device" in the former.  [...]

...no system call exists to "create" an I/O channel, for devices cannot be created or destroyed dynamically under normal circumstances. Likewise, no means exists for seeking, nor for performing "I/O control" functions such as ioctl() in Unix. Indeed, the KERNAL proves much closer to the Plan-9 philosophy here, where an application would open a special "command" channel to the indicated device to conduct such "meta" or "out-of-band" transactions. For example, to delete ("scratch") a file from a disk, the user typically will "open" the resource called S0:THE-FILE-TO-RMV on device 8 or 9, channel 15. [...]

...With all relevant KERNAL system calls vectored, programmers can intercept system calls to implement virtual devices with any address in the range of [32,256]. Conceivably, one can load a device driver binary into memory, patch the KERNAL I/O vectors, and from that moment forward, a new (virtual) device could be addressed. So far, this capability has never been publicly known as utilized, presumably for two reasons: (1) The KERNAL provides no means for dynamically allocating device IDs, and (2) the KERNAL provides no means for loading a relocatable binary image. Thus, the burden of collisions both in I/O space and in memory space falls upon the user, while platform compatibility across a wide range of machines falls upon the software author. Nonetheless, support software for these functions could easily be implemented if desired.

(https://en.wikipedia.org/wiki/KERNAL#On_device-independent_I/O)

 

  • Like 1

Share this post


Link to post
Share on other sites
  • 0
On 5/2/2021 at 5:10 PM, Roman K said:

 

15. (ZP) indirect introduced in 65c02 is a complement to indirect mode used by JMP instruction (with limitation to ZP only), so we have that orthogonality Absolute addressing (including jump) - Indirect addressing (including jump) but due to ZP size limitations I can hardly imagine the good usage for it.

If I have an address stored in two consecutive locations in zero page, then the address pointed to by that zero page location can be accessed the indexed indirect mode, the indirect indexed mode, or the indirect mode equally easily. Suppose the zero page location containing the address pointer is $7C:

If X = $7C , then

LDA ($00,X) loads the data pointed to by the contents of (7C, 7D)

Changing the value of X just changes which zero page location is the pointer. You're right, it isn't particularly useful, as there are only so many pointer locations available on zero page. 

If that address pointed to is the start of a lookup table, we can use Y to index into that table with

LDA ($7C),Y

Or if we don't want to use X or Y, and are just interested in the memory location pointed to by 7C/7D, then

LDA (7C)

On my META/L editor, I use the following code for the above three commands:

LDA+ 00

LDA- 7C

LDA/ 7C

Edited by Ed Minchau
  • Thanks 1

Share this post


Link to post
Share on other sites
  • 0
1 hour ago, Ed Minchau said:

Changing the value of X just changes which zero page location is the pointer. You're right, it isn't particularly useful, as there are only so many pointer locations available on zero page. 

Of course, there are 128 possible zero page pointers, and we only have a 16 bit address space, so it's really not as limiting as it might at first seem based on the architecture.

Share this post


Link to post
Share on other sites
  • 0
14 hours ago, Scott Robison said:

Of course, there are 128 possible zero page pointers, and we only have a 16 bit address space, so it's really not as limiting as it might at first seem based on the architecture.

Actually there's more than that available. The location doesn't need to be an even number. The only ones you can't use are 00 and 01 (because that's allocated for memory paging) and FF (because the next byte isn't on zero page, although that still may work, haven't tried it). So there's either 253 or 254 possible locations.

Share this post


Link to post
Share on other sites
  • 0
7 minutes ago, Ed Minchau said:

Actually there's more than that available. The location doesn't need to be an even number. The only ones you can't use are 00 and 01 (because that's allocated for memory paging) and FF (because the next byte isn't on zero page, although that still may work, haven't tried it). So there's either 253 or 254 possible locations.

While there may be 253 possible locations for pointers, overlapping locations can't be used at the same time. Therefore, there are only around 127 pointers usable at one time.

  • Like 1

Share this post


Link to post
Share on other sites
  • 0
21 minutes ago, Ed Minchau said:

Actually there's more than that available. The location doesn't need to be an even number. The only ones you can't use are 00 and 01 (because that's allocated for memory paging) and FF (because the next byte isn't on zero page, although that still may work, haven't tried it). So there's either 253 or 254 possible locations.

Right, I wasn't talking about the X16 specifically, just the 65C02 in general. I'm sure a clever person could probably find a way to create some useful utility with overlapping pointers (much as the BIT instruction is used to change the following instruction in some cases), but in general, you are only going to get 128 distinct pointers in zero page at one time.

9 minutes ago, Elektron72 said:

While there may be 253 possible locations for pointers, overlapping locations can't be used at the same time. Therefore, there are only around 127 pointers usable at one time.

Yes, this is what I meant (except for the X16 specific portion of addresses 0 & 1 being used for banking, which I was ignoring for the general applicability to the 65C02, or really anything in the 6502 family).

Edited by Scott Robison

Share this post


Link to post
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Answer this question...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

Sign in to follow this  

×
×
  • Create New...

Important Information

Please review our Terms of Use