Jump to content

BruceMcF

Members
  • Posts

    1027
  • Joined

  • Last visited

  • Days Won

    29

BruceMcF last won the day on November 12

BruceMcF had the most liked content!

3 Followers

Recent Profile Visitors

The recent visitors block is disabled and is not being shown to other users.

BruceMcF's Achievements

  1. The more period-correct way to avoid that kind of overwrite would be to end the load if it hits $9F00 ... maybe set carry to indicate the load was abandoned early. If that is built into the binary page adjustment, then you need the binary page adjustment to always take place on a true binary page boundary ... so if, eg, the start address was $8080, then the destination page would be $8000 with $80 in the index register, and when the index register hits zero, increment the high byte of the page address and check if it equals $9F ... if so, jump to the "load finished" handling. If the START address was in the I/O page, that wouldn't change anything. Which is also period-correct. If you start with an I/O page address, you better know what you are doing.
  2. If I am counting the opcodes correctly, its around 75 bytes. ~~~~~~~~~~~~~~~~~~~~~ @Scott Robison Yes, the difference is between finding an efficient way to find 0.016666... of something and an efficient way to find 0.01666259765625 of something. For a dividend like 14,400, the difference seems like it would be smaller than the precision of a 16.8 answer.
  3. I believe that the PRIMARY issue is that the design is not quite finished. I might speculate ... but it would JUST be speculation, since I am another outsider to the process ... fine-tuning the 65C02 PS/2 code timing or having the 8bit ATTiny micro-controller handle the PS/2 ports might well handle the last serious outstanding problem, then when mist has free time, the Kernel can be fixed up for a beta release, with things like the bit bang serial port and a range of the "not yet implemented" features of existing routines.
  4. I don't know if my last code above works correctly, but I think it's on the order of 130 clock cycles. Fully unrolled loops saves a lot of that, since even three byte shifts by two bits don't save a lot of space by looping.
  5. My algorithm is arrived at heuristically, taking the first power of 2 larger than 60, finding the residual of (1/60)-(1/64), inverting that, finding the first power of two larger than the inverse of that residual, and repeating until the residual is clearly going to be imperceptible for the application at hand. That heuristic has no dependence on the length of the dividend.
  6. Yes ... "shift right six bits" is shift left two bits and treat the result as all being shifted down one byte. "Shift right another four bits" is ten bits total, so shift right two bits and treat the result as all being shifted down one byte. And "if you still have anything left, return to step 2" is 14 bits total, which is shift left two bits and treat the result as all being shifted down TWO bytes. I don't test to do 3, since the shifting has already been done, and since the target is to have a 16.8 result from a 16 bit dividend, so it makes more sense to simply go ahead and include it. 1/60 - [(1/64)+(1/1024)+(1/16,384)] ~= 4.07x10^(-6), or about 0.000407% discrepancy ... off by 4 parts in a million. That discrepancy should be imperceptible.
  7. The functional equivalent of "how many games does it have"?, for this project, would be the supply of skilled assembly language programmers for the processor in question who are willing to donate time to writing the low level Kernel code. Not just the "main guy", but suppose the "main guy" drops out, what are the chances of finding a replacement? There would be a lot more project risk pursuing a project like this based in a CPU family other than 6502 or z80 families. However, once the X16 is released, then -- at least as the expansion card interface has been described so far -- it should be possible to do a "Super PET" approach and make a bus mastering 6809 card, for people who want to pursue that.
  8. I patterned the entry and exit after the original routine. By taking 1/[1/60 - (1/64 + 1/1024)], which is around 15,000, the next power of 2 approximation is add X/16,384, that is, X/[(64)*(256)], which is already computed. The only change is that rather than just using the high bit from the low order part of val2, it is necessary to keep both bits and add to the "frac" for proper rounding. I think it is something like this (untested, and I am VERY tired from a 12 hour shift then 3 hours driving to pick up my grandkid for Thanksgiving with his family up here!! -- so could be making an obvious mistake!!): val1 := r11 frac1:= r12 val2 := r13 frac2 := r14 stz val1+1 ; this is for both /64 and /16384 ; +3 clocks txa ; +2 = 5 sty val1 ; +3 =8 asl ; +2 = 10 rol val1 ; +5= 15 rol val1+1 ; +5 =20 asl ; +2 = 22 rol val1 ; +5 = 27 asl val1+1 ; +5 = 32 sta frac1 ; +3 = 35 stx frac2 ; this is for /1024 ; +3 = 38 sty val2 ; +3 = 42 lda #0 ; +2 = 44 lsr val2 ; +5 = 49 ror frac2 +5 = 54 ror ; +2 = 56 lsr val2 ; +5 = 61 ror frac2 ; +5 = 66 ror ; Note: carry is clear ; +2 = 68 adc frac1 ; the /16,384 version, this is really "a byte below 16.8" ; +3 = 71 sta frac2+1 ; use high bit below for rounding if result is >= 128 ; +3 = 74 lda frac2 ; 3 = 77 adc val1 ; in /16384, this is the "frac" place of 16.8 ; +3 = 80 sta frac2 ; +3 = 83 lda val2 ; +3= 86 adc val1+1 ; in /16384. this is the low byte of the integer part of 16.8 ; +3 = 89 sta val2 ; +3 = 92 asl frac2+1 ; round up if high bit below the fractional part is set. ; +5 = 97 lda frac2 ; +3 = 102 adc frac1 ; the /64 version, "the REAL fractional part in 16.8 result" ; +3 = 105 sta zsm_fracsteps ; +4 = 109 lda val2+1 ; +3 = 102 adc val1 ; +3 = 105 sta zsm_steps ; +4 = 109 lda val1+1 ; +3 = 112 adc #0 ; +2 = 114 sta zsm_steps+1 ; +4 = 118 rts ; +5 (+6 JSR) = 129 If it works correctly, it would be accurate to within 0.0005% ... I am guessing around 130 cycles.
  9. Though as the Pi has gone, maybe the "Pi5" will be able to ... only time will tell.
  10. Yes, I was replying to a comment in the first page of the comments without reading the second page yet. Though for a 16.8 result, the high six bits of the lower byte are still part of the result, and the second from the bottom bit could be used for rounding ... indeed, do the /1024 second, so that the second from the bottom bit is in the carry. However, the /1024 means that high byte of the result is 0, so can be omitted. I get about 45 bytes. proc calculate_tick_rate: near ; X/Y = tick rate (Hz) - divide by approximately 60 and store to zsm_steps ; use the ZP variable as tmp space ; Actually (X/64)+(X/1024), which is within 0.3% of X/60. ; use the ZP variable as tmp space val1 := r11 frac1:= r12 val2 := r13 txa sty val1 asl rol val1 rol val1+1 asl rol val1 asl val1+1 sta frac1 txa sty val2 lsr val2 ror lsr val2 ror adc frac1 sta zsm_fracsteps lda va1 adc val2 sta zsm_steps lda val1+1 adc #0 sta zsm_steps+1 rts
  11. 1/64 + 1/1024 = 0.0166015625, which is about 0.3% off. That is 4*(1/256) and (1/256)/4.
  12. A (preferably single cycle) DMA transfer into or out of Vera would be the closest you could come, as expanding the SPRAM module built into the FPGA is not going to happen.
  13. A RAM expansion similar to the C64 REU would be possible, since it's possible for expansion cards to take over the bus.
  14. Putting the general 16.8 / #60 -> 16.8 algorithm into the format of the original ... but using the new API scratch register space, on the theory that a general set-up process may be using any of r0-r10 for holding values used in the process. As with the original use of r0-r2, this shouldn't be placed in an interrupt routine. I also noticed that since the remainder is not the actual remainder, but the residual remainder after calculating the /256th fractional part, moving the loop indexing and exit saves two bytes as the "rol rem" required for the first 24 iterations can be used in the final iteration to get the high bit of the residual remainder into the carry flag for rounding the final result. A third byte may be saved by omitting the "clc" at the start the loop, which is a bit which flows through but is not part of the final result. If carry could be set or clear at the beginning of the process, having this garbage bit can confuse debugging, but it doesn't affect the result. On the other hand, the "*32" pre-shift to speed up the process, because when dividing by 60, the first five iterations are known to fail the test subtraction, adds around 11 bytes. The speed optimization is omitting one byte from the shifting, replacing five left shifts with an equivalent three right shits, unrolling the loop, and being able to use the register for the shift. .proc calculate_tick_rate: near ; X/Y = tick rate (Hz) - divide by 60 and store to zsm_steps ; use the ZP variable as tmp space value := r11 frac := r12 rem := r13 stx value sty value+1 stz frac stz rem ; the first five trial subtracts will always fail ... for space optimization, just let them ldx #25 ; 24 trial subtracts, plus 1 partial loop to shift the final result bit in. ; may be omitted clc ; this bit will be shifted into bottom of residual remainder at end ; avoiding a possible garbage bit floating through makes it easier to debug loop: ; Shift dividend one bit left, then into remainder byte ; In iterations 1-24, the remainder is prepared for next trial subtract ; In iterations 2-25, the 24 result bits are shifted in to replace the value. rol frac rol value rol value+1 rol rem ; in last iteration, residual remainder high bit is in carry dex beq endloop lda rem sec sbc #60 bcc loop sta rem bra loop endloop: ; round up if residual remainder is >=$80 ; high bit of residual remainder is already in carry lda frac adc #0 sta zsm_fracsteps lda value adc #0 sta zsm_steps lda value+1 adc #0 sta zsm_steps+1 rts ~~~~~~~~~~~~~~~~~~~ For the slight speed optimization since the first five iterations are known to fail: ; the first five trial subtracts will always fail, so pre-shift by five ; start at destination and shifting right by three gives the same effect stz value stz frac stx value+1 tya lsr ror value+1 ror value lsr ror value+1 ror value lsr ror value+1 ror value sta rem ; Now do the remaining 19 (of 24) shifts, plus 1 to shift in the final result bit ldx #20
×
×
  • Create New...

Important Information

Please review our Terms of Use