You can optimize an LDr #0 / STr pair into an STZ instruction, as long as the addressing mode for the store instruction is ZP, ZP,X, ABS, or ABS,X. If you're using the zero loaded in the register r as a value to pass to a subroutine (e.g. LDA #0; STA address; JSR subroutineThatTakesOnA), you could omit the STZ optimization for easier porting to an NMOS 6502 system.
X and Y can now be pushed or pulled. On an NMOS, this requires the X or Y register to be transferred to the accumulator before pushing, and transferred back after pulling.
You can branch relative unconditionally with the BRA instruction in the same way as any conditional branch, like what you used accidentally.
You can increment or decrement the accumulator, while an NMOS can only increment/decrement X, Y, or memory contents. Slithy pointed out that CA65 just takes plain INC/DEC without an operand specified, but some other assemblers accept A as the operand, and others use the mnemonics INA and DEA.
The 65C02 also adds #IMM, ZP,X and ABS,X modes to BIT. On an NMOS, testing with a immediate requires the byte to be stored in a literal pool in memory, or the AND instruction to be used at the cost of destroying the value in the accumulator.
AND'ing or OR'ing directly on memory can be done with the TRB and TSB instructions. Just discard the Z flag (the result of BIT) and back up the accumulator whenever needed. Remember that TRB takes the inverse of the value in the accumulator, but TSB doesn't.
There's also a (ZP) mode added to all instructions that have (ZP,X) and (ZP),Y. This prevents the need of setting X or Y to zero. For example, I can write a MEMCPY implementation (basically emulates LDIR in Z80) with the source and destination pointers in zero-page, and a 16-bit loop counter in X and Y. If this were an NMOS, I need to put one (or even both) halves of the loop counter in the zero-page.
There's also a JMP (ABS,X) instruction added, which is very handy for jump tables.
There's also zero-page-only bit test+branch (BBR, BBS) and manipulation (RMB, SMB) instructions. These were initially only on Rockwell models before they were merged into the WDC design. You won't find these on a 65816, even though it has all other 65C02 instructions. These can't be found on 65C02's from any other manufacturers.
WDC models also have the WAI and STP instructions that set the processor to a low-power state. The former stops execution until any interrupt occurs, and the latter stops the processor only until a reset occurs. No other 65C02's have these.
The 65C02 also fixes well-known NMOS bugs such as the infamous JMP ($xxFF) bug and the decimal mode flags bug. Some cycle counts were also changed.
Is this information helpful for you? Because I still see you LDA #0 before STA and such.