Jump to content

BruceMcF

Members
  • Posts

    1072
  • Joined

  • Last visited

  • Days Won

    29

Everything posted by BruceMcF

  1. Since he organized the instruction set for ease of hand-assembly, with only CPR having the opcode it has for functional reasons, I do think that saving odd/even in a zero page byte, and cutting the size of the two vector tables in half is the most useful decode.
  2. I didn't have a node, but there was a time that I had a FIDOnet email address as my email address, via a local BBS.
  3. If it was sold at the same price or less than a 6502 at the time of 6502 introduction, possibly not ... the 6809 is a fine instruction set. Regarding the original topic, I've been looking at something I mentioned in another thread: After looking more closely, the same routine can handle byte and word indirect load, the same can handle byte and word indirect store, and with an entry stub byte pop and byte store-pop (before SIGN is examined to see whether to run load or store). So that's six operations with two routines. The same routine can handle add and subtract, and with an entry stub compare. A single routine can handle direct load or store, a single routine can handle increment and decrement. So that's seven more operations with three routines. Among the "embedded register" operations, only word pop (POPD) and SET are singletons, because the way that the first decrements twice in process, which cannot be handled by a prefix to indirect load or store (which post-increment), and setting a value with the contents of the accumulator doesn't make any sense. Even though each routine is longer than the SweetCX16 routines, the reduction in number of routines to seven to cover 15 operations makes the codesize smaller. In the dispatch, after branching to handle the "Branch & etc." ($0n) ops by using the bit4 value to set SIGN to $00 or $FF, clearing bit4 and LSR four times to get one of eight index values from 0 to 14, storing that in X so that JMP (REGOPS,X) based on a 16byte (rather than 30 byte) vector table, saving 14 more bytes. Handily, the index (even numbers from 0 to 14) are in both A and X on dispatch, so if the index is used (as in indirect loads and indirect store to tell whether it's a byte or a word load), you can do "TYX: TAY" to save the index where it can be tested directly with "CPY #n". I haven't tackled the Branch operations, but I am thinking a similar process can be used with the low bit of the operand, since 8 of 13 are by pairs: Branch No Carry / Branch Carry; Branch Plus / Branch Minus; Branch Zero / Branch Nonzero; and Branch if Minus 1 / Branch if not Minus One. If Carry, Minus, Nonzero, and non-Minus 1 are each tested with a result of #$0 if the condition is met and #$FF if the condition is not met, then jumping to BRANCH with EOR SIGN will invert the status for the "odd" operands (Carry, Minus, Nonzero, Not-minus one), and leave the status alone for the "even" operands. Then a branch is performed if the result after EOR SIGN is $#FF. Then Branch Always simply calls BRANCH with a status of $00, since Branch Always is an "odd" op. So that handles 9 of 13 ops. RTN is easy, since it is op $00, "CMP #0 : BEQ RTN". BRK, RS and BS are all singletons, but the dispatch can use the "SIGN" value to distinguish between BK and RS and jump to BS on it's own, so filter out RTN, extract SIGN based on the low bit, clear the low bit, transfer to X and do an X-indexed Jump on a 14 bytes index table ... rather than 26 in SweetCX16 ... crunches the size even more. The hope would be to get smaller than the original Sweet16, so that there is a "faster, large footprint" version and a "slower, smaller footprint" version.
  4. Yes, that's the basic idea ... routines used for system initialization. The original Sweet16 saved more space in the Apple ROM(s?) than Sweet16 used, so it was in a "free" resource in a space consumption sense, at a time when ROM cost much more per KB than it does today. In the context of the X16, the most appealing aspect may be the ability to conserve on relatively scarce Low RAM if a Sweet16 VM is available.
  5. There ought to be differences throughout ... it is not attempting to be a port of Woz's code, it is attempting to be an open source VM that executes Sweet16 source, and beyond that is explicitly focusing on using a faster approach routine dispatch. And in part it is explicitly pursuing a different speed / codesize tradeoff than Woz's Sweet16 because I doubt I could pursue Woz's specific goals and do any better than he did.
  6. Because the purpose of Sweet16 is most often to write very compact "setup" type code ... you are not intended to use Sweet16 inside inner loops executed many times, but in one-off startup processes that would take much more space in 6502 binary code but since its only executed once, using Sweet16 doesn't have much runtime impact. If you are USING the result of the subtraction, you will typically need the result in the accumulator ... even if you want the result somewhere else, you will need it in the accumulator before "putting it somewhere else", so will have to follow, eg, CPR R5 with LD R13, wasting a byte in your Sweet16 code for every subtract operation. Plus, what other SINGLE "register" operation do you need? There are already three spare non-register ops, and any of those COULD be implemented with an operand that refers to one register ... or even two registers. For instance, you could SWAP two registers with the source register index in the low nybble of the operand byte and the destination register index in the high nybble, or multiply two registers with the 32bit result replacing the operands in a similar way. Or you could have shift left and shift right with the low nybble giving the register and the high nybble giving the number of shifts, from 0 to 15. Edit: Actually, I may have convinced myself to substitute the extensions I have in the current source WITH those three ... a register swap and binary shift left and right.
  7. It's also what you will get from a 6502 assembler that supports computed values if you do, eg, "0-512". Back in the 50s and 60s, there was a greater diversity of ways to represent negative values. The other signed representations used back in the day were one's complement, which is just inverting each bit (and which makes $FFFF the 16bit "negative zero") and signed magnitude, where one bit represents the sign and the rest is the absolute value. One's complement is still found in some dedicated signal processing hardware, but two's complement basically took over starting for most purposes in the 70s and is what is assumed as "normal" today. The 6502 is a bit funny in that it does subtraction as a one's complement machine would do ... invert all bits of the operand and add to the accumulator ... but by using "SEC" as the "clear borrow" instruction and "CLC" as the "set borrow'" instruction, it WORKS as a two's complement subtraction.
  8. Old school (1970s era) l assemblers often required a location label to end with a ":" to distinguish it from a value define by an equate ... it seems like most newer assemblers make the ":" optional and work out whether it's an equate or a location from the context. However, standardization of assembler syntax is more about what people are used to than having an explicit standard to follow, so YMMV. I always include the ":" ... that's more from habit than from any view on whether it is "best practice".
  9. Yes, this is the expensiveness of a general stack frame stack in the 6502 family. If it was a 256 deep integer stack implemented as a split byte pushdown X-stack, and i++ is item #4 (zero base) on the stack, it's just: LDX TOS INC STLO+4,X BNE + INC STHI+4,X + ...
  10. Yes ... a retro youtubers chat, a members/retro programming chat, and a "hobbies other than those with a specific chat" chat would be good headings. If the main forum categories are Commander and non-Commander, it's an open question whether to add a members Commander programming chat to the Commander chat section and a members/retro non-Commander programming section to the non-Commander section, but I'd lean in favor.
  11. It's not all at random, though it's definitely not like a microcoded processor instruction set ... more like the 6502 which feels free to take an opcode that doesn't make sense for one type of operation and use it for another. That is aaa d rrrr, address-mode, direction, register rrrr is the 16bit pseudo register, R0-R15 d=0: operand to ACC, d=1, ACC to operand aaa is the operand address mode aaa=000, immediate (followed by 16bit immediate value) aaa=001, register direct aaa=010, register indirect post-increment (lower 8bits, upper 8bits cleared) aaa=011, register double indirect post-increment aaa=100, pre-decrement register indirect aaa=110, pre-decrement register double indirect ... but "0000 rrrr" is a nonsense action (eg, you cannot store the accumulator to the number 768), so instead "rrrr" is a non-register operation. With all of the indirect loads and store being post-increment, you only need one direction of pre-decrement to make a stack. HOWEVER, the single byte pre-decrement needs load AND store, so together they can do a move of a block of data from "back to front", if source is below destination and they overlap. So the single byte "POP" has both directions but the double byte one (to allow 16bit value stacks) only needs one direction. Then there is arithmetic: aaa s rrrr, arithmetic-op, sign, register s= sign, 0=+ (plus), 1=- (minus) aaa = 101, sum, ACC = ACC +/- register, set branch carry, zero, negative conditions aaa = 110, sum value = ACC +/- register, set branch carry, zero, negative conditions, discard value aaa = 111, inc/decrement, register = register +/- 1 Of course, 6 load/store operations and 3 arithmetic operations do not fit into 3bits, except the comparison operation only needs to subtract, and double byte pre-decrement only needs to work in one direction, so that lets it fit together like a jigsaw puzzle. Edit: Note that while the register in the bottom and the instruction at the top is for functional reasons, there is ONE instruction that is almost implied by the design, which is the CPR Rn, since when beginning execution, the four operation bits end up in bits 1-4 of the Y register (for the instruction table look-up), and CPR uses that to give the index of the target for the subtraction, which the CPR instruction places in R13 rather than R0 (the accumulator). So the CPR opcode has to be $Dn, unless the CPR result register is relocated. And then that implies that the two-byte POP instruction is at $Cn, by the "jigsaw puzzle" logic above. Since I was attempting a re-implementation, I focused on the description of the functioning of the operations rather than Woz's implementation. However, even with a different dispatch model, if trying to squeeze object size in a "Sweet 16 replacement", rather than optimizing for speed, I could imagine have a single indirect load and a single indirect store routine, which works out from the bits of the opcode and the status of the carry flag whether it is pre-decrement or post-increment and whether it is a single or double byte operation, covering 7 operations in two routines. Direct register moves could be handled by putting source in Y and destination in X, at the cost of using absolute rather than direct addressing for the Y-indexed operation, giving one routine the two direct ones. One could imagine the immediate register load being run by the two-byte accumulator load, setting the indirect source register to R15, the PC register, and using Y-indexed store, so the immediate load is taken over by the single indirect load routine as well. Then at the cost of three more zero page bytes ... two more bytes in a dedicated "register 17" initialized to $0001, and one set to either $80 or $00 based on whether adding or subtracting, setting up the correct target and operand index in X and Y would all allow all five arithmetic operations to be done in a single routine. If that was done by shifting the instruction one bit to the left and using the carry flag and sign flag to split the code set into quarters, you might restrict the jump table to the $0n instructions, making it only 26-32 bytes long.
  12. Yes. That looks like a regression. Back then, I was primarily developing in VICE emulating a 65C02 in a C64, so the $CC00 branch saw more testing.
  13. One thing to be careful about is that somebody interfacing with the PS/2 keyboard they have doesn't guarantee that the same code will interface with every keyboard that obeys the PS2 spec. One possibility, which is waiting on Micheal Steil having time for the X16 project to open up again, is that the timeout on the 65C02/6522 code is just too fast when the code runs at 8MHz, and adjusting the timeout will fix the issue.
  14. I'm not going to say that at my age, it's getting toward a 10 inch 720p portable TV being a "retina" display for me ... ... but I ain't going to deny, either. After Christmas for the grandkids, things are tight enough now that buying a tin of Altoid mints for a "system in an Altoids tin" build pretty much exhausted my disposable income ... ... that is, not the parts of the "system in an Altoids tin" ... just the Altoids themselves ... ... but hopefully by March or April, there will be breathing room again.
  15. Yeah, if I was trying to use a Pi for it, a Pi 4 B and external M.2 NVMe SSD would be the bare minimum. Indeed, an advantage of the Pi 4 B over the Pi 400 is that there is a case available that allows you to put the "external" M.2 SSD inside the case, using a "U" USB 3.0 connector to pass the connection from the Pi 4 back to the SSD underneath. Tom's hardware looks at how much faster the SSD is than relying on an SD card for a Pi 4. I feel like I might be OK with a Pi 4 B with SSD, but I am OK with my cheap Chinese laptop with the Gateway branding, and have neither the cash nor the time to spend on experimenting right now. And of course, if I had jumped in last year when I had more of both, it would have been more like a Banana Pi of some sort than a RPi, since I was in China. As far as "hide away from computer distractions and just write", a Pi Zero 2 W with SSD and LibreOffice might be good enough. A Pi 400 with external SSD might be good enough if there was a way to upgrade the keyboard.
  16. If you want a mailbox, and have active processing on both sides, you can use dual ported SRAM, though that might not be period specific enough.
  17. If you are just building something which isn't an FPGA, then a mix of the way the 80-column chip from the C128 and the way the Vera interfaces could be used. Two registers, one is the address register for the other. The "address" register is just a register, which allows for up to 256 register addresses for the second, which is a normal 65C02 register. But if you limit it to 128 register addresses, the top bit can specify whether it is going to be a read or a write, and if a read, it can pre-fetch the value for the data register location. And VRAM data port, and two address registers, with the bottom two or three bits implied $00 when the low address register is set, and incrementing after a read or write of the data port. Then a counter provides the bottom bits, with the reset for the counter set to the write of the bottom address register. That allows for 256K or 512K of VRAM. That whole approach is quite different from the strategy that was used for the VIC-20, C64, C128 etc., with an FPGA being the closest feasible hobbyist equivalent to the CBM approach of building a dedicated ASIC video chip. But there's obviously no rule about sticking closely to that approach ... building the display from discrete electronic components is more like the Apple I approach, though I take it that you are aiming at a substantially more capable display.
  18. Yes, you can use a register to buffer a write from the 65C02 bus and buffer the five address lines to know where the write is going. The bottleneck is the read from the 65C02 bus. A Vera register is read in a single clock cycle, so you might need a register file for some of the Vera registers.
  19. That would seem to be the direct solution ... pull the PS2 SCL clock line low while leaving the data line high before starting to send data to the master over I2C, release it when done.
  20. Adafruit has "Programming SPI flash with an FT232H breakout", which is interesting on the point in parentheses.
  21. I think that rather the CX16 would poll the ATTiny for a key on one vertical refresh and poll for a mouse event on the next and then repeat. 1/30th of a second seems to me to be fast enough that the queue should not fill up, so there is no need for an NMI. "As yet no solution" could easily mean that the preference is to get the PS/2 straight from the 6522 if possible, and that is waiting until Michael Steil has time to work through suggested fixes. It could also easily mean that the preference is to use the ATTiny, and that is not working yet. That's among the reasons I am not keen on doing any tea leaf reading on comments like that.
  22. In the 80s, sure, it would have been separate VRAM just like the MOS 8563 in the C128, for exactly the same reasons. An integrated RAM component would have been prohibitively expensive. It would have been DRAM, for cost reasons. Today, the integrated RAM comes with the commodity priced FPGA part, so that's the cheaper approach. Whether it would have all been that integrated ... I guess it depends on whether it was TED philosophy Commodore or Commodore 128 philosophy Commodore.
  23. Yes, it was a windfall as a dependent of a veteran that enabled me to buy my C64 / 1541 / monochrome monitor system in 1981. I was expecting to have the best computer in my dorm when I got back to University, but there was a kid from a better heeled family who had a IBM-PC with dual disk drives and 128KB RAM! For quite a while, more expensive systems were things I read about in Byte Magazine, not things I was in the market for. I eventually scraped up enough to buy a daisywheel printer, but most of my games and programs were typed in from computer magazines rather than store bought. I had a truly atrocious assembler written in Basic, and the quirky but quite useful Busy Bee Perfect Writer as a word processor. I got a C128D plus external 1581 and 1571 with some of my Peace Corps readjustment allowance from teaching Math in Grenada in the West Indies, then fried it all to quickly after arrival by plugging the printer power tap into the datasette port upside down, frying the 8502, leaving me with a C64 with 1571 and 1581 drive, and daisywheel printer, that I ended up taking to grad school. ____________ I think the Turbo-Grafx16 comparison is not all that far off ... it is based on a 65C02 processor (though with added bits) that can run at either 1.79MHz or 7.16MHz, and the TurboGrafx video chipsets used a 16bit datapath to have more processing bandwidth than a VIC-II generation graphical chip could handle. As 16bits at 7.16MHz is about 1/3 the bandwidth of 8bits at 50MHz, one would hope that Vera would outperform it. And while a PSG was built into the TG16's CPU with four channels of wavetable playback, a basic FM synthesis option from combining the first two of the channels, and a kind of PCM playback, the X16 Audio is also more capable. If the HuC6280 was still in production, they might have used that and CPU with 64KB logical address space, 2MB physical address space and PSG would have been locked in from the start ... but it's not, so it wasn't an option.
  24. I don't think the audio would be on the same chip, and AFAIU, there's no functional connection between the audio registers and the video registers, but if address pins were column / row strobed like the MOS 8563, that still hits the speed. The MOS 8563 was limited to 64KB and it had a system bus interface based on an address and a data register with 37 logical registers, with a byte based local address bus and row/column strobed local memory address, but that arrangement wouldn't have nearly the bandwidth required by Vera.
×
×
  • Create New...

Important Information

Please review our Terms of Use