Ed Minchau Posted November 22, 2021 Share Posted November 22, 2021 (edited) On 11/22/2021 at 2:20 AM, paulscottrobson said: You could approximate. The tick range isn't going to be that large, I wouldn't have thought. 1/60 (0.016667) is pretty close to 1/16 (0.015625), better done here obviously as 4/256. The error is about 6.5%. This is a bit high, high enough to be noticeable perhaps, so you could have a fudge factor or factors to add based around a variety of tick rates, given the limited range of tick in practice. You could knock up some numbers in a spreadsheet and get it near enough for practical purposes. It's even debatable whether there's any point in representing it as a fixed point fraction, I don't think many people are going to be able to tell a difference of 1-2% in tempo. You mean 1/64 rather than 1/16, but yeah, this is basically what I was saying (my idea works out to x/64+ x/1024 ~ x/60). Your idea works out to a total of 12 lines of code, just 6 copies of LSR highbyte ROR lobyte So that's 24 bytes, but it could be even more efficient: LDA lobyte LSR highbyte ROR A LSR highbyte ROR A LSR highbyte ROR A LSR highbyte ROR A LSR highbyte ROR A LSR highbyte ROR A STA lobyte that's 22 bytes. Then, to add 1/1024 of the original to the total and get a little closer to 1/60, LDX highbyte STX highbyte2 LSR highbyte2 ROR A LSR highbyte2 ROR A LSR highbyte2 ROR A LSR highbyte2 ROR A CLC ADC lobyte STA lobyte LDA highbyte2 ADC highbyte STA highbyte This gets within 0.4% of 1/60 and totals 49 bytes. Edited November 22, 2021 by Ed Minchau 1 Quote Link to comment Share on other sites More sharing options...
Fabio Posted November 22, 2021 Share Posted November 22, 2021 (edited) maybe you should use the fact that one full bye of movement is free i'll try this LDA highb-inp STA midbyte2 STZ highbyte2 ; because 16:8 is a 24 bit number lda lobyte-inp ldy #2 :loop ASL A ROL midbyte2 ROL highbyte2 LSR highb-inp ROR lobyte-inp DEY BNE :loop ;given priority on code compactness CLC ADC lobyte-inp STA lobyte2 LDA highb-inp ADC midbyte2 STA midbyte2 TYA ; same as LDA #0 because now Y is zero ADC highbyte2 STA highbyte2 RTS Edited November 22, 2021 by Fabio not finished posting Quote Link to comment Share on other sites More sharing options...
BruceMcF Posted November 23, 2021 Share Posted November 23, 2021 On 11/22/2021 at 4:20 AM, paulscottrobson said: You could approximate. The tick range isn't going to be that large, I wouldn't have thought. 1/60 (0.016667) is pretty close to 1/16 (0.015625), better done here obviously as 4/256. The error is about 6.5%. This is a bit high, high enough to be noticeable perhaps, so you could have a fudge factor or factors to add based around a variety of tick rates, given the limited range of tick in practice. You could knock up some numbers in a spreadsheet and get it near enough for practical purposes. It's even debatable whether there's any point in representing it as a fixed point fraction, I don't think many people are going to be able to tell a difference of 1-2% in tempo. 1/64 + 1/1024 = 0.0166015625, which is about 0.3% off. That is 4*(1/256) and (1/256)/4. Quote Link to comment Share on other sites More sharing options...
paulscottrobson Posted November 23, 2021 Share Posted November 23, 2021 On 11/23/2021 at 12:53 AM, BruceMcF said: 1/64 + 1/1024 = 0.0166015625, which is about 0.3% off. That is 4*(1/256) and (1/256)/4. As it's a 16 bit value you could probably just divide the upper byte by 1024, which would be >> 8x >> 2 which would be quicker and only marginally less accurate. It might be quicker to decompose the value into 256.a+b , apply your calculation and simplify it. But I've got to go for a flu jab in a minute Quote Link to comment Share on other sites More sharing options...
Ed Minchau Posted November 23, 2021 Share Posted November 23, 2021 On 11/22/2021 at 5:53 PM, BruceMcF said: 1/64 + 1/1024 = 0.0166015625, which is about 0.3% off. That is 4*(1/256) and (1/256)/4. That's what my little algorithm above does. The result isn't 16.8, but can be shifted there. Quote Link to comment Share on other sites More sharing options...
BruceMcF Posted November 23, 2021 Share Posted November 23, 2021 (edited) On 11/23/2021 at 6:43 AM, Ed Minchau said: That's what my little algorithm above does. The result isn't 16.8, but can be shifted there. Yes, I was replying to a comment in the first page of the comments without reading the second page yet. On 11/23/2021 at 4:35 AM, paulscottrobson said: As it's a 16 bit value you could probably just divide the upper byte by 1024, which would be >> 8x >> 2 which would be quicker and only marginally less accurate. ... Though for a 16.8 result, the high six bits of the lower byte are still part of the result, and the second from the bottom bit could be used for rounding ... indeed, do the /1024 second, so that the second from the bottom bit is in the carry. However, the /1024 means that high byte of the result is 0, so can be omitted. I get about 45 bytes. proc calculate_tick_rate: near ; X/Y = tick rate (Hz) - divide by approximately 60 and store to zsm_steps ; use the ZP variable as tmp space ; Actually (X/64)+(X/1024), which is within 0.3% of X/60. ; use the ZP variable as tmp space val1 := r11 frac1:= r12 val2 := r13 txa sty val1 asl rol val1 rol val1+1 asl rol val1 asl val1+1 sta frac1 txa sty val2 lsr val2 ror lsr val2 ror adc frac1 sta zsm_fracsteps lda va1 adc val2 sta zsm_steps lda val1+1 adc #0 sta zsm_steps+1 rts Edited November 23, 2021 by BruceMcF 1 Quote Link to comment Share on other sites More sharing options...
Ed Minchau Posted November 24, 2021 Share Posted November 24, 2021 On 11/23/2021 at 3:45 PM, BruceMcF said: Yes, I was replying to a comment in the first page of the comments without reading the second page yet. Though for a 16.8 result, the high six bits of the lower byte are still part of the result, and the second from the bottom bit could be used for rounding ... indeed, do the /1024 second, so that the second from the bottom bit is in the carry. However, the /1024 means that high byte of the result is 0, so can be omitted. I get about 45 bytes. Yep, we came up with nearly identical solutions, really just the entry and exit to the routine is the only difference. Either great minds think alike, or fools seldom differ. Quote Link to comment Share on other sites More sharing options...
BruceMcF Posted November 25, 2021 Share Posted November 25, 2021 (edited) On 11/23/2021 at 10:29 PM, Ed Minchau said: Yep, we came up with nearly identical solutions, really just the entry and exit to the routine is the only difference. Either great minds think alike, or fools seldom differ. I patterned the entry and exit after the original routine. By taking 1/[1/60 - (1/64 + 1/1024)], which is around 15,000, the next power of 2 approximation is add X/16,384, that is, X/[(64)*(256)], which is already computed. The only change is that rather than just using the high bit from the low order part of val2, it is necessary to keep both bits and add to the "frac" for proper rounding. I think it is something like this (untested, and I am VERY tired from a 12 hour shift then 3 hours driving to pick up my grandkid for Thanksgiving with his family up here!! -- so could be making an obvious mistake!!): val1 := r11 frac1:= r12 val2 := r13 frac2 := r14 stz val1+1 ; this is for both /64 and /16384 ; +3 clocks txa ; +2 = 5 sty val1 ; +3 =8 asl ; +2 = 10 rol val1 ; +5= 15 rol val1+1 ; +5 =20 asl ; +2 = 22 rol val1 ; +5 = 27 asl val1+1 ; +5 = 32 sta frac1 ; +3 = 35 stx frac2 ; this is for /1024 ; +3 = 38 sty val2 ; +3 = 42 lda #0 ; +2 = 44 lsr val2 ; +5 = 49 ror frac2 +5 = 54 ror ; +2 = 56 lsr val2 ; +5 = 61 ror frac2 ; +5 = 66 ror ; Note: carry is clear ; +2 = 68 adc frac1 ; the /16,384 version, this is really "a byte below 16.8" ; +3 = 71 sta frac2+1 ; use high bit below for rounding if result is >= 128 ; +3 = 74 lda frac2 ; 3 = 77 adc val1 ; in /16384, this is the "frac" place of 16.8 ; +3 = 80 sta frac2 ; +3 = 83 lda val2 ; +3= 86 adc val1+1 ; in /16384. this is the low byte of the integer part of 16.8 ; +3 = 89 sta val2 ; +3 = 92 asl frac2+1 ; round up if high bit below the fractional part is set. ; +5 = 97 lda frac2 ; +3 = 102 adc frac1 ; the /64 version, "the REAL fractional part in 16.8 result" ; +3 = 105 sta zsm_fracsteps ; +4 = 109 lda val2+1 ; +3 = 102 adc val1 ; +3 = 105 sta zsm_steps ; +4 = 109 lda val1+1 ; +3 = 112 adc #0 ; +2 = 114 sta zsm_steps+1 ; +4 = 118 rts ; +5 (+6 JSR) = 129 If it works correctly, it would be accurate to within 0.0005% ... I am guessing around 130 cycles. Edited December 2, 2021 by BruceMcF Quote Link to comment Share on other sites More sharing options...
EvilSnack Posted November 26, 2021 Share Posted November 26, 2021 One way to divide any integer by 60, using only integer math: 1. Shift right by six bits. Save off this value. 2. Shift right by another four bits. Add this to the variable holding the saved-off value. 3. If you still have anything left in the shift variable, return to step 2. 4. The variable with the saved-off value will contain the quotient. Quote Link to comment Share on other sites More sharing options...
EvilSnack Posted November 27, 2021 Share Posted November 27, 2021 On 11/26/2021 at 3:41 PM, EvilSnack said: One way to divide any integer by 60, using only integer math: 1. Shift right by six bits. Save off this value. 2. Shift right by another four bits. Add this to the variable holding the saved-off value. 3. If you still have anything left in the shift variable, return to step 2. 4. The variable with the saved-off value will contain the quotient. If the most significant byte is 0x38 or lower, this will deliver more accurate results, and might be faster: 1. Shift left by two bits. Save off this value. 2. Shift right by four bits. Add this to the variable holding the saved-off value. 3. If you still have anything left in the shift variable, return to step 2. 4. Add 0x80 (this will round the final result to the nearest 1/60th). 5. Drop the least significant byte. Quote Link to comment Share on other sites More sharing options...
Ed Minchau Posted November 27, 2021 Share Posted November 27, 2021 On 11/26/2021 at 2:41 PM, EvilSnack said: One way to divide any integer by 60, using only integer math: 1. Shift right by six bits. Save off this value. 2. Shift right by another four bits. Add this to the variable holding the saved-off value. 3. If you still have anything left in the shift variable, return to step 2. 4. The variable with the saved-off value will contain the quotient. This is the same algorithm that Bruce and I both posted above. Must be close to optimal. 1 Quote Link to comment Share on other sites More sharing options...
BruceMcF Posted November 27, 2021 Share Posted November 27, 2021 (edited) On 11/26/2021 at 11:21 PM, Ed Minchau said: This is the same algorithm that Bruce and I both posted above. Must be close to optimal. Yes ... "shift right six bits" is shift left two bits and treat the result as all being shifted down one byte. "Shift right another four bits" is ten bits total, so shift right two bits and treat the result as all being shifted down one byte. And "if you still have anything left, return to step 2" is 14 bits total, which is shift left two bits and treat the result as all being shifted down TWO bytes. I don't test to do 3, since the shifting has already been done, and since the target is to have a 16.8 result from a 16 bit dividend, so it makes more sense to simply go ahead and include it. 1/60 - [(1/64)+(1/1024)+(1/16,384)] ~= 4.07x10^(-6), or about 0.000407% discrepancy ... off by 4 parts in a million. That discrepancy should be imperceptible. Edited November 30, 2021 by BruceMcF 2 Quote Link to comment Share on other sites More sharing options...
EvilSnack Posted December 1, 2021 Share Posted December 1, 2021 My algorithm was more general-purpose, not assuming any particular length of the dividend. The reason it works is because the series (1/16) + (1/16)^2 + (1/16)^3 + ... approaches 1/15, and then it's only a matter of dividing by 4 to get the final result. Quote Link to comment Share on other sites More sharing options...
BruceMcF Posted December 2, 2021 Share Posted December 2, 2021 On 12/1/2021 at 4:20 PM, EvilSnack said: My algorithm was more general-purpose, not assuming any particular length of the dividend. The reason it works is because the series (1/16) + (1/16)^2 + (1/16)^3 + ... approaches 1/15, and then it's only a matter of dividing by 4 to get the final result. My algorithm is arrived at heuristically, taking the first power of 2 larger than 60, finding the residual of (1/60)-(1/64), inverting that, finding the first power of two larger than the inverse of that residual, and repeating until the residual is clearly going to be imperceptible for the application at hand. That heuristic has no dependence on the length of the dividend. Quote Link to comment Share on other sites More sharing options...
Scott Robison Posted December 2, 2021 Share Posted December 2, 2021 My algorithm was fast! I don't know if it was correct, but it was fast. And if you're going to get an inexact answer anyway, why worry about how exact it is and just go with fast. Of course, I guess 230 cycles can be improved upon. LDA #0 is much faster, and it is guaranteed to be closer to the correct answer than infinity, so just use it! {slinking away} Quote Link to comment Share on other sites More sharing options...
ZeroByte Posted December 2, 2021 Author Share Posted December 2, 2021 On 12/1/2021 at 7:14 PM, Scott Robison said: My algorithm was fast! Yeah - you're just over 1 scanline of raster time. Mine's 2.5, General is 18 scan lines. 1 Quote Link to comment Share on other sites More sharing options...
BruceMcF Posted December 2, 2021 Share Posted December 2, 2021 On 12/2/2021 at 12:02 PM, ZeroByte said: Yeah - you're just over 1 scanline of raster time. Mine's 2.5, General is 18 scan lines. I don't know if my last code above works correctly, but I think it's on the order of 130 clock cycles. Fully unrolled loops saves a lot of that, since even three byte shifts by two bits don't save a lot of space by looping. Quote Link to comment Share on other sites More sharing options...
Scott Robison Posted December 3, 2021 Share Posted December 3, 2021 On 12/2/2021 at 4:39 PM, BruceMcF said: I don't know if my last code above works correctly, but I think it's on the order of 130 clock cycles. Fully unrolled loops saves a lot of that, since even three byte shifts by two bits don't save a lot of space by looping. Could be. I didn't analyze the question of "how to efficiently divide by 60". I just tried to find the fastest way to code the original expression given. Not to say I found the fastest way, just that I had noticed all the bit shifts were 8 bits off another bit shift, so I was able to avoid most bit twiddling and just do repeated adds with the bytes aligned to different positions. Quote Link to comment Share on other sites More sharing options...
ZeroByte Posted December 3, 2021 Author Share Posted December 3, 2021 @BruceMcFI'll have to try it out. Do you know offhand how many bytes it compiles down to? Quote Link to comment Share on other sites More sharing options...
BruceMcF Posted December 3, 2021 Share Posted December 3, 2021 (edited) On 12/3/2021 at 12:29 AM, ZeroByte said: @BruceMcFI'll have to try it out. Do you know offhand how many bytes it compiles down to? If I am counting the opcodes correctly, its around 75 bytes. ~~~~~~~~~~~~~~~~~~~~~ @Scott Robison Quote Could be. I didn't analyze the question of "how to efficiently divide by 60". I just tried to find the fastest way to code the original expression given. Not to say I found the fastest way, just that I had noticed all the bit shifts were 8 bits off another bit shift, so I was able to avoid most bit twiddling and just do repeated adds with the bytes aligned to different positions. Yes, the difference is between finding an efficient way to find 0.016666... of something and an efficient way to find 0.01666259765625 of something. For a dividend like 14,400, the difference seems like it would be smaller than the precision of a 16.8 answer. Edited December 3, 2021 by BruceMcF Quote Link to comment Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.