Does this seem eggregious to anyone besides me?

Recommended Posts

On 11/22/2021 at 2:20 AM, paulscottrobson said:

You could approximate. The tick range isn't going to be that large, I wouldn't have thought. 1/60 (0.016667) is pretty close to 1/16 (0.015625), better done here obviously as 4/256. The error is about 6.5%.

This is a bit high, high enough to be noticeable perhaps, so you could have a fudge factor or factors to add based around a variety of tick rates, given the limited range of tick in practice. You could knock up some numbers in a spreadsheet and get it near enough for practical purposes.

It's even debatable whether there's any point in representing it as a fixed point fraction, I don't think many people are going to be able to tell a difference of 1-2% in tempo.

You mean 1/64 rather than 1/16, but yeah, this is basically what I was saying (my idea works out to x/64+ x/1024 ~ x/60).  Your idea works out to a total of 12 lines of code, just 6 copies of

LSR highbyte

ROR lobyte

So that's 24 bytes, but it could be even more efficient:

LDA lobyte

LSR highbyte

ROR A

LSR highbyte

ROR A

LSR highbyte

ROR A

LSR highbyte

ROR A

LSR highbyte

ROR A

LSR highbyte

ROR A

STA lobyte

that's 22 bytes.  Then, to add 1/1024 of the original to the total and get a little closer to 1/60,

LDX highbyte

STX highbyte2

LSR highbyte2

ROR A

LSR highbyte2

ROR A

LSR highbyte2

ROR A

LSR highbyte2

ROR A

CLC

STA lobyte

LDA highbyte2

STA highbyte

This gets within 0.4% of 1/60 and totals 49 bytes.

Edited by Ed Minchau
Share on other sites

maybe you should use the fact that one full bye of movement is free

i'll try this

LDA highb-inp

STA midbyte2

STZ highbyte2             ; because 16:8 is a 24 bit number

lda lobyte-inp

ldy   #2

:loop

ASL A

ROL midbyte2

ROL highbyte2

LSR highb-inp

ROR lobyte-inp

DEY

BNE :loop                   ;given priority on code compactness

CLC

STA lobyte2

LDA highb-inp

STA midbyte2

TYA                ;     same as LDA #0 because now Y is zero

STA highbyte2

RTS

Edited by Fabio
not finished posting
Share on other sites

On 11/22/2021 at 4:20 AM, paulscottrobson said:

You could approximate. The tick range isn't going to be that large, I wouldn't have thought. 1/60 (0.016667) is pretty close to 1/16 (0.015625), better done here obviously as 4/256. The error is about 6.5%.

This is a bit high, high enough to be noticeable perhaps, so you could have a fudge factor or factors to add based around a variety of tick rates, given the limited range of tick in practice. You could knock up some numbers in a spreadsheet and get it near enough for practical purposes.

It's even debatable whether there's any point in representing it as a fixed point fraction, I don't think many people are going to be able to tell a difference of 1-2% in tempo.

1/64 + 1/1024 = 0.0166015625, which is about 0.3% off.  That is 4*(1/256) and (1/256)/4.

Share on other sites

On 11/23/2021 at 12:53 AM, BruceMcF said:

1/64 + 1/1024 = 0.0166015625, which is about 0.3% off.  That is 4*(1/256) and (1/256)/4.

As it's a 16 bit value you could probably just divide the upper byte by 1024, which would be >> 8x >> 2 which would be quicker and only marginally less accurate. It might be quicker to decompose the value into 256.a+b , apply your calculation and simplify it. But I've got to go for a flu jab in a minute

Share on other sites

On 11/22/2021 at 5:53 PM, BruceMcF said:

1/64 + 1/1024 = 0.0166015625, which is about 0.3% off.  That is 4*(1/256) and (1/256)/4.

That's what my little algorithm above does. The result isn't 16.8, but can be shifted there.

Share on other sites

On 11/23/2021 at 6:43 AM, Ed Minchau said:

That's what my little algorithm above does. The result isn't 16.8, but can be shifted there.

Yes, I was replying to a comment in the first page of the comments without reading the second page yet.

On 11/23/2021 at 4:35 AM, paulscottrobson said:

As it's a 16 bit value you could probably just divide the upper byte by 1024, which would be >> 8x >> 2 which would be quicker and only marginally less accurate. ...

Though for a 16.8 result, the high six bits of the lower byte are still part of the result, and the second from the bottom bit could be used for rounding ... indeed, do the /1024 second, so that the second from the bottom bit is in the carry. However, the /1024 means that high byte of the result is 0, so can be omitted. I get about 45 bytes.

proc calculate_tick_rate: near
; X/Y = tick rate (Hz) - divide by approximately 60 and store to zsm_steps
; use the ZP variable as tmp space
; Actually (X/64)+(X/1024), which is within 0.3% of X/60.
; use the ZP variable as tmp space
val1 := r11
frac1:= r12
val2 := r13
txa
sty val1
asl
rol val1
rol val1+1
asl
rol val1
asl val1+1
sta frac1
txa
sty val2
lsr val2
ror
lsr val2
ror
sta zsm_fracsteps
lda va1
sta zsm_steps
lda val1+1
sta zsm_steps+1
rts

Edited by BruceMcF
Share on other sites

On 11/23/2021 at 3:45 PM, BruceMcF said:

Yes, I was replying to a comment in the first page of the comments without reading the second page yet.

Though for a 16.8 result, the high six bits of the lower byte are still part of the result, and the second from the bottom bit could be used for rounding ... indeed, do the /1024 second, so that the second from the bottom bit is in the carry. However, the /1024 means that high byte of the result is 0, so can be omitted. I get about 45 bytes.

Yep, we came up with nearly identical solutions,  really just the entry and exit to the routine is the only difference. Either great minds think alike,  or fools seldom differ.

Share on other sites

On 11/23/2021 at 10:29 PM, Ed Minchau said:

Yep, we came up with nearly identical solutions,  really just the entry and exit to the routine is the only difference. Either great minds think alike,  or fools seldom differ.

I patterned the entry and exit after the original routine.

By taking 1/[1/60 - (1/64 + 1/1024)], which is around 15,000, the next power of 2 approximation is add X/16,384, that is, X/[(64)*(256)], which is already computed. The only change is that rather than just using the high bit from the low order part of val2, it is necessary to keep both bits and add to the "frac" for proper rounding.

I think it is something like this (untested, and I am VERY tired from a 12 hour shift then 3 hours driving to pick up my grandkid for Thanksgiving with his family up here!! -- so could be making an obvious mistake!!):

val1 := r11
frac1:= r12
val2 := r13
frac2 := r14
stz val1+1 ; this is for both /64 and /16384 ; +3 clocks
txa ; +2 = 5
sty val1 ; +3 =8
asl ; +2 = 10
rol val1 ; +5= 15
rol val1+1 ; +5 =20
asl ; +2 = 22
rol val1 ; +5 = 27
asl val1+1 ; +5 = 32
sta frac1 ; +3 = 35
stx frac2 ; this is for /1024 ; +3 = 38
sty val2 ; +3 = 42
lda #0 ; +2 = 44
lsr val2 ; +5 = 49
ror frac2 +5 = 54
ror ; +2 = 56
lsr val2 ; +5 = 61
ror frac2 ; +5 = 66
ror ; Note: carry is clear ; +2 = 68
adc frac1 ; the /16,384 version, this is really "a byte below 16.8" ; +3 = 71
sta frac2+1 ; use high bit below for rounding if result is >= 128 ; +3 = 74
lda frac2 ; 3 = 77
adc val1 ; in /16384, this is the "frac" place of 16.8 ; +3 = 80
sta frac2 ; +3 = 83
lda val2 ; +3= 86
adc val1+1 ; in /16384. this is the low byte of the integer part of 16.8 ; +3 = 89
sta val2 ; +3 = 92
asl frac2+1 ; round up if high bit below the fractional part is set. ; +5 = 97
lda frac2 ; +3 = 102
adc frac1 ; the /64 version, "the REAL fractional part in 16.8 result" ; +3 = 105
sta zsm_fracsteps ; +4 = 109
lda val2+1 ; +3 = 102
adc val1 ; +3 = 105
sta zsm_steps ; +4 = 109
lda val1+1 ; +3 = 112
adc #0 ; +2 = 114
sta zsm_steps+1 ; +4 = 118
rts ; +5 (+6 JSR) = 129

If it works correctly, it would be accurate to within 0.0005% ... I am guessing around 130 cycles.

Edited by BruceMcF
Share on other sites

One way to divide any integer by 60, using only integer math:

1. Shift right by six bits. Save off this value.

2. Shift right by another four bits. Add this to the variable holding the saved-off value.

3. If you still have anything left in the shift variable, return to step 2.

4. The variable with the saved-off value will contain the quotient.

Share on other sites

On 11/26/2021 at 3:41 PM, EvilSnack said:

One way to divide any integer by 60, using only integer math:

1. Shift right by six bits. Save off this value.

2. Shift right by another four bits. Add this to the variable holding the saved-off value.

3. If you still have anything left in the shift variable, return to step 2.

4. The variable with the saved-off value will contain the quotient.

If the most significant byte is 0x38 or lower, this will deliver more accurate results, and might be faster:

1. Shift left by two bits. Save off this value.

2. Shift right by four bits. Add this to the variable holding the saved-off value.

3. If you still have anything left in the shift variable, return to step 2.

4. Add 0x80 (this will round the final result to the nearest 1/60th).

5. Drop the least significant byte.

Share on other sites

On 11/26/2021 at 2:41 PM, EvilSnack said:

One way to divide any integer by 60, using only integer math:

1. Shift right by six bits. Save off this value.

2. Shift right by another four bits. Add this to the variable holding the saved-off value.

3. If you still have anything left in the shift variable, return to step 2.

4. The variable with the saved-off value will contain the quotient.

This is the same algorithm that Bruce and I both posted above.  Must be close to optimal.

Share on other sites

On 11/26/2021 at 11:21 PM, Ed Minchau said:

This is the same algorithm that Bruce and I both posted above.  Must be close to optimal.

Yes ... "shift right six bits" is shift left two bits and treat the result as all being shifted down one byte. "Shift right another four bits" is ten bits total, so shift right two bits and treat the result as all being shifted down one byte. And "if you still have anything left, return to step 2" is 14 bits total, which is shift left two bits and treat the result as all being shifted down TWO bytes. I don't test to do 3, since the shifting has already been done, and since the target is to have a 16.8 result from a 16 bit dividend, so it makes more sense to simply go ahead and include it.

1/60 - [(1/64)+(1/1024)+(1/16,384)] ~= 4.07x10^(-6), or about 0.000407% discrepancy ... off by 4 parts in a million. That discrepancy should be imperceptible.

Edited by BruceMcF
Share on other sites

My algorithm was more general-purpose, not assuming any particular length of the dividend.

The reason it works is because the series (1/16) + (1/16)^2 + (1/16)^3 + ... approaches 1/15, and then it's only a matter of dividing by 4 to get the final result.

Share on other sites

On 12/1/2021 at 4:20 PM, EvilSnack said:

My algorithm was more general-purpose, not assuming any particular length of the dividend.

The reason it works is because the series (1/16) + (1/16)^2 + (1/16)^3 + ... approaches 1/15, and then it's only a matter of dividing by 4 to get the final result.

My algorithm is arrived at heuristically, taking the first power of 2 larger than 60, finding the residual of (1/60)-(1/64), inverting that, finding the first power of two larger than the inverse of that residual, and repeating until the residual is clearly going to be imperceptible for the application at hand.

That heuristic has no dependence on the length of the dividend.

Share on other sites

My algorithm was fast! I don't know if it was correct, but it was fast. And if you're going to get an inexact answer anyway, why worry about how exact it is and just go with fast.

Of course, I guess 230 cycles can be improved upon.

LDA #0 is much faster, and it is guaranteed to be closer to the correct answer than infinity, so just use it! {slinking away}

Share on other sites

On 12/1/2021 at 7:14 PM, Scott Robison said:

My algorithm was fast!

Yeah - you're just over 1 scanline of raster time. Mine's 2.5, General is 18 scan lines.

Share on other sites

On 12/2/2021 at 12:02 PM, ZeroByte said:

Yeah - you're just over 1 scanline of raster time. Mine's 2.5, General is 18 scan lines.

I don't know if my last code above works correctly, but I think it's on the order of 130 clock cycles. Fully unrolled loops saves a lot of that, since even three byte shifts by two bits don't save a lot of space by looping.

Share on other sites

On 12/2/2021 at 4:39 PM, BruceMcF said:

I don't know if my last code above works correctly, but I think it's on the order of 130 clock cycles. Fully unrolled loops saves a lot of that, since even three byte shifts by two bits don't save a lot of space by looping.

Could be. I didn't analyze the question of "how to efficiently divide by 60". I just tried to find the fastest way to code the original expression given. Not to say I found the fastest way, just that I had noticed all the bit shifts were 8 bits off another bit shift, so I was able to avoid most bit twiddling and just do repeated adds with the bytes aligned to different positions.

Share on other sites

@BruceMcFI'll have to try it out. Do you know offhand how many bytes it compiles down to?

Share on other sites

On 12/3/2021 at 12:29 AM, ZeroByte said:

@BruceMcFI'll have to try it out. Do you know offhand how many bytes it compiles down to?

If I am counting the opcodes correctly, its around 75 bytes.

~~~~~~~~~~~~~~~~~~~~~

Quote

Could be. I didn't analyze the question of "how to efficiently divide by 60". I just tried to find the fastest way to code the original expression given. Not to say I found the fastest way, just that I had noticed all the bit shifts were 8 bits off another bit shift, so I was able to avoid most bit twiddling and just do repeated adds with the bytes aligned to different positions.

Yes, the difference is between finding an efficient way to find 0.016666... of something and an efficient way to find 0.01666259765625 of something.

For a dividend like 14,400, the difference seems like it would be smaller than the precision of a 16.8 answer.

Edited by BruceMcF

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

×   Pasted as rich text.   Paste as plain text instead

Only 75 emoji are allowed.