Jump to content
  • 0

Does anyone know how to improve drawing speed of this implemented Basic (originally C128) code?


Opaque
 Share

Question

I Found this code from youtube: "3D graphics on Commodore 128 in Basic 7.0 " and wrote it to commander x16 emulator, only real change was alter original draw command to PSET command of commander....and line 10. Executing the code reminds, painfully I'd say, how slow computers used to be in 8bit era. Now I know Commander is supposed to run at least 8MHz, not sure how accurate the emulator is, but it feels like this thing will take ages 😉  Now the question is,  can this code be speeded up (without the use of Assembler, because I cannae program innit.) well actually, can the drawing speed be improved somehow ? (I don't know how to implement this to VERA) Thanks.  

5 DIM M(255)

10 SCREEN$80

20 PI=3.14
30 A=COS (PI/4)
40 FOR Y=1 TO 150 STEP 2.5
50 E=A*Y
60 C=Y-70
70 C=C*C
80 FOR X=1 TO 141
90 D=X-70
100 Z=80*EXP (-0.001*(C+D*D))
110 X1=X+E
120 Y1=Z+E
130 IF Y1>=M(X1) THEN M(X1)=Y1: PSET (X1),(ABS(Y1-150)),5
 
140 NEXT X
150 NEXT Y
200 END
 

Link to comment
Share on other sites

17 answers to this question

Recommended Posts

  • 0

Its not the "drawing speed" per se  -- VERA is extremely fast, and the BASIC PSET command is already a pretty fast pipe into VERA's api.  

No, its the math, parsing, and variable stores/fetches that are the speed road blocks here.

The "EXP" function is especially nasty.   Implementing a polynomial expansion in 6502 code is a tall order, and that's a large part of what's going on when BASIC evaluates the inverse log function. 

Still, there's always room in BASIC for some speedups. 

I took your code and added a command to reset the "TI" system timer variable at the very beginning, and to print elapsed TI (i.e., jiffies) right before the END command.  Result, 4289 jiffies on R38.  (I'm just back on this forum from being away for a while and haven't yet had time to play with any of the new releases...   lots of new developments, though... quite exciting!).

Anyway, just I played with it for about an hour and came up with the below.  This revision does the exact same thing as your program, but takes only 2804 jiffies in R38...   an approx.  35.5%  time savings.    As with all BASIC optimizations, its now obtuse, and opaque, and borderline unreadable.   But that's the cost for making BASIC faster, alas. 

1 TI=.:DIMX,I,Y,E,C:L=-0.001:M=150:N=80:DIM M(255),D(141):SCREEN$80:A=COS(.785)
2 FORI=1TO141:X=I-70:D(I)=X*X:NEXT
3 FORO=1TOMSTEP2.5:E=A*O:I=O-70:C=I*I:FORI=1TO141:X=I+E
4 Y=E+N*EXP(L*(C+D(I))):IFY>=M(X)THENM(X)=Y:PSETX,ABS(Y-M),5:NEXT:NEXT:GOTO6
5 NEXT:NEXT
6 PRINTTI:END

See my thread in HOWTOs for an idea of what's going on here, but by and large what I did was eliminate intermediate variables, kicked that multiplication that calculates the square of 'D" which was done over and over (but with exactly the same results) within the inner loop into an initialization procedure that just keeps the figures in an array,  initialized variables by order of use (focusing on the inner loop) and did various things to streamline the code for the BASIC parsing engine. 

Let me know if you have questions about anything in particular and I'd be happy to try and explain. 

 

 

  • Like 2
Link to comment
Share on other sites

  • 0

Very impressive boost 😮  I just had to test how well it runs in C128 (Vice128-emulator) and Ho and behold it boosts that too ! to a very dramatic difference compared to the original. I just needed to discard the neat TI=. , it throws syntax error with it =( 

1 dimx,i,y,e,c:l=-0.001:m=150:n=80:dim m(255),d(141):graphic1,1:a=cos(.785)                                                -Commander uses screen80 , c128 need graphic1,1
2 fori=1to141:x=i-70:d(i)=x*x:next
3 foro=1tomstep5:e=a*o:i=o-70:c=i*i:fori=1to141:x=i+e                                                                                               -increased step to 5 to give it less ground to cover                                                                          
4 y=e+n*exp(l*(c+d(i))):ify>=m(x)thenm(x)=y:psetx,abs(y-m),5:next:next                                                           -removed goto6
5 next:next
6 end

 

but now it runs so much more efficiently 😮 I'm very impressed how well your solution works in c128 too. Didn't knew basic can be concatenated like this.....

Just wondering, because I still dont know how to write assembly, how much it would speed the raw calculating speed on this.....any guesses ? 😄

 

Link to comment
Share on other sites

  • 0

I assume that most of the run time is spent in doing the floating point math operations, which aren't any faster in assembly (unless you somehow manage to write a faster FP library than the one that's in the kernal). So my rough estimate is that an assembly version is about twice as fast at most.

Link to comment
Share on other sites

  • 0
Posted (edited)

Opaque,

 

Try this...

0 ti$="000000"
1 dimx,i,y,e,c:l=-0.001:m=150:n=80:dim m(255),d(141):graphic1,1:a=cos(.785)
2 fori=1to141:x=i-70:d(i)=x*x:next
3 foro=1tomstep5:e=a*o:i=o-70:c=i*i:fori=1to141:x=i+e:y=e+n*exp(l*(c+d(i)))
4 ify>=m(x)thenm(x)=y:draw1,x,abs(y-m):next:next:goto6
5 next:next
6 x=ti:char1,.,.,str$(x):end
 

You got an error with the TI=. because on the original Commodore machines you reset BOTH the TI and TI$ system variables by setting the string TI$ to "000000" (on the X16 you can reset either one separately); also you'll see that I used a 'CHAR' command (which is how the 128 prints text to the bitmap) in order to display the elapsed jiffies. 

On my copy of VICE128 (older version) it plots (the step 5 version) at 16613 jiffies with my improvements.   The original c128 version from the youtube video takes almost 24000 jiffies.   The C128 does not enjoy quite as much of a speed increase percentage wise compared to optimizing on the X16, which I would attribute to (a) the fact that the C128's graphics drawing commands are slower than the X16's version, since the 128 performs some bounds checking and has to compute a pixel location in memory using a slice of a tile approach which is how C128 stores bitmaps in memory; and (b) the C128 has a slower BASIC parsing engine compared to the X16 which uses a faster parser from the C64.   One the 128, BASIC has access to more than 64K of ram, but therefore has to do some banking in its parser; whereas the x16 (like the c64) is limited to much less memory but does not waste cycles every single character parse in BASiC dealing with banking operations. 

Also I think desertfish has it right:   Since the 6502 floating point libraries in the Commodore machines are decent already, you would probably just call them from within any machine code implementation of this.   So your further time savings in going to assembler would only be in eliminating BASIC parser and variable handling routines.   You'd get more speed, but not as much as you might hope since the maths will still have to grind like they do now.   

 

 

Edited by Snickers11001001
Link to comment
Share on other sites

  • 0

Ed,

Completely unnecessary as the program stands now.... you're right!   

I think that was probably a leftover from playing with an alternative way of doing the IF/THEN branch I had toyed with briefly and I just never pulled it back out; and then got it stuck in my head that it was something I had done for a speedup.   I doubt it makes more than a few jiffies difference either way as the code stands now.    

 

.       

 

 

 

Link to comment
Share on other sites

  • 0
  • Super Administrators
On 5/11/2022 at 11:35 AM, Ed Minchau said:

Tested it out, it doesn't make a jiffy's difference. 

and it shouldn't... the way BASIC does GOTOs, the GOTO actually reads the address and line number of line 5, regardless, and then forwards to line 6. So since the NEXT:NEXT:GOTO get parsed and executed either way, the execution time for the two versions should be almost identical. 

 

Link to comment
Share on other sites

  • 0
Posted (edited)

I changed @Snickers11001001 code above to find the minimum and maximum values of the parameter of the EXP function: it ranges from -11.2035 to -0.001.  By creating a lookup table using that EXP function before going through the drawing loop, I managed to get the TI down to 2213; shaved off 9.3 seconds.

1 TI=.:DIMX,I,Y,E,C,Z:M=150:N=80:DIM M(255),D(141),F(255)
2 A=.7071:G=-255/11.2035:L=-0.001*G:SCREEN$80
3 FORI=1TO141:X=I-70:D(I)=X*X:NEXT
4 FORI=0TO255:F(I)=N*EXP(I/G):NEXT
5 FORO=1TOMSTEP2.5:E=A*O:I=O-70:C=I*I:FORI=1TO141:X=I+E:Z=INT(L*(C+D(I))+.5)
6 Y=E+F(Z):IFY>=M(X)THENM(X)=Y:PSETX,ABS(Y-M),5
7 NEXT:NEXT:PRINTTI:END
Edited by Ed Minchau
Link to comment
Share on other sites

  • 0
Posted (edited)

And after playing with it a bit more, 2105, which is 11 seconds faster than what @Snickers11001001 code had, and about 60%faster than @Opaque's code.  I'm using r40, if that makes any difference. I can get it a little faster with F only having 256 elements, but it doesn't look quite as good as their images, and this one is very close to theirs. Can we get it down below 2000 jiffies?

1 TI=.:DIMX,I,Y,E,C:M=150:N=80:DIM M(255),D(141),F(384)
2 A=.7071:G=-384./11.2035:L=-0.001*G:SCREEN$80
3 FORI=1TO141:X=I-70:D(I)=X*X:NEXT
4 FORI=0TO384:F(I)=N*EXP(I/G):NEXT
5 FORO=1TOMSTEP2.5:E=A*O:I=O-70:C=I*I:FORI=1TO141:X=I+E
6 Y=E+F(INT(L*(C+D(I))+.5)):IFY>=M(X)THENM(X)=Y:PSETX,ABS(Y-M),5
7 NEXT:NEXT:PRINTTI:END
Edited by Ed Minchau
Link to comment
Share on other sites

  • 0
Posted (edited)

1831 jiffies.

1 TI=.:DIMX,I,Y,E,C:M=150:N=80:DIM M(255),D(141),F(500)
2 A=.7071:G=-500./11.2035:L=-0.001*G:SCREEN$80
3 FORI=1TO141:X=I-70:D(I)=X*X:NEXT
4 FORI=0TO500:F(I)=N*EXP(I/G):NEXT
5 FORO=1TOMSTEP2.5:E=A*O:I=O-70:C=I*I:FORI=1TO141:X=I+E
6 Y=E+F(L*(C+D(I))):IFY>M(X)THENM(X)=Y:PSETX,M-Y,5
7 NEXT:NEXT:PRINTTI:END
	
Edited by Ed Minchau
Link to comment
Share on other sites

  • 0
Posted (edited)

1773 jiffies, now down below half a minute, although this is kind of cheating:

1 TI=.:DIMX,I,Y,E,C:M=150:DIM M(255),D(141),F(500)
2 A=.7071:L=.5/11.2035:SCREEN$80
3 FORI=1TO141:X=I-70:D(I)=X*X:NEXT
4 FORI=0TO500:READN:F(I)=N:NEXT
5 FORO=1TOMSTEP2.5:E=A*O:I=O-70:C=I*I:FORI=1TO141:X=I+E
6 Y=E+F(L*(C+D(I))):IFY>M(X)THENM(X)=Y:PSETX,M-Y,5
7 NEXT:NEXT:PRINTTI:END
200 DATA 80,78.23,76.49,74.8,73.14,71.52,69.94,68.39,66.87,65.39
202 DATA 63.94,62.52,61.14,59.78,58.46,57.16,55.9,54.66,53.45,52.26
204 DATA 51.11,49.97,48.87,47.78,46.72,45.69,44.68,43.69,42.72,41.77
206 DATA 40.85,39.94,39.06,38.19,37.34,36.52,35.71,34.92,34.14,33.39
208 DATA 32.65,31.92,31.22,30.52,29.85,29.19,28.54,27.91,27.29,26.68
210 DATA 26.09,25.52,24.95,24.4,23.86,23.33,22.81,22.31,21.81,21.33
212 DATA 20.86,20.39,19.94,19.5,19.07,18.64,18.23,17.83,17.43,17.05
214 DATA 16.67,16.3,15.94,15.59,15.24,14.9,14.57,14.25,13.93,13.62
216 DATA 13.32,13.03,12.74,12.46,12.18,11.91,11.65,11.39,11.14,10.89
218 DATA 10.65,10.41,10.18,9.96,9.74,9.52,9.31,9.1,8.9,8.7
220 DATA 8.51,8.32,8.14,7.96,7.78,7.61,7.44,7.28,7.11,6.96
222 DATA 6.8,6.65,6.5,6.36,6.22,6.08,5.95,5.81,5.69,5.56
224 DATA 5.44,5.32,5.2,5.08,4.97,4.86,4.75,4.65,4.54,4.44
226 DATA 4.35,4.25,4.15,4.06,3.97,3.88,3.8,3.71,3.63,3.55
228 DATA 3.47,3.4,3.32,3.25,3.18,3.1,3.04,2.97,2.9,2.84
230 DATA 2.78,2.71,2.65,2.6,2.54,2.48,2.43,2.37,2.32,2.27
232 DATA 2.22,2.17,2.12,2.07,2.03,1.98,1.94,1.9,1.85,1.81
234 DATA 1.77,1.73,1.7,1.66,1.62,1.59,1.55,1.52,1.48,1.45
236 DATA 1.42,1.39,1.36,1.33,1.3,1.27,1.24,1.21,1.18,1.16
238 DATA 1.13,1.11,1.08,1.06,1.04,1.01,0.99,0.97,0.95,0.93
240 DATA 0.91,0.89,0.87,0.85,0.83,0.81,0.79,0.77,0.76,0.74
242 DATA 0.72,0.71,0.69,0.68,0.66,0.65,0.63,0.62,0.6,0.59
244 DATA 0.58,0.57,0.55,0.54,0.53,0.52,0.51,0.49,0.48,0.47
246 DATA 0.46,0.45,0.44,0.43,0.42,0.41,0.4,0.4,0.39,0.38
248 DATA 0.37,0.36,0.35,0.35,0.34,0.33,0.32,0.32,0.31,0.3
250 DATA 0.3,0.29,0.28,0.28,0.27,0.26,0.26,0.25,0.25,0.24
252 DATA 0.24,0.23,0.23,0.22,0.22,0.21,0.21,0.2,0.2,0.19
254 DATA 0.19,0.18,0.18,0.18,0.17,0.17,0.16,0.16,0.16,0.15
256 DATA 0.15,0.15,0.14,0.14,0.14,0.13,0.13,0.13,0.13,0.12
258 DATA 0.12,0.12,0.12,0.11,0.11,0.11,0.11,0.1,0.1,0.1
260 DATA 0.1,0.09,0.09,0.09,0.09,0.09,0.08,0.08,0.08,0.08
262 DATA 0.08,0.08,0.07,0.07,0.07,0.07,0.07,0.07,0.06,0.06
264 DATA 0.06,0.06,0.06,0.06,0.06,0.06,0.05,0.05,0.05,0.05
266 DATA 0.05,0.05,0.05,0.05,0.04,0.04,0.04,0.04,0.04,0.04
268 DATA 0.04,0.04,0.04,0.04,0.04,0.04,0.03,0.03,0.03,0.03
270 DATA 0.03,0.03,0.03,0.03,0.03,0.03,0.03,0.03,0.03,0.03
272 DATA 0.03,0.02,0.02,0.02,0.02,0.02,0.02,0.02,0.02,0.02
274 DATA 0.02,0.02,0.02,0.02,0.02,0.02,0.02,0.02,0.02,0.02
276 DATA 0.02,0.02,0.02,0.01,0.01,0.01,0.01,0.01,0.01,0.01
278 DATA 0.01,0.01,0.01,0.01,0.01,0.01,0.01,0.01,0.01,0.01
280 DATA 0.01,0.01,0.01,0.01,0.01,0.01,0.01,0.01,0.01,0.01
282 DATA 0.01,0.01,0.01,0.01,0.01,0.01,0.01,0.01,0.01,0.01
284 DATA 0.01,0.01,0.01,0.01,0.01,0.01,0.01,0.01,0.01,0.01
286 DATA 0.01,0.01,0.01,0,0,0,0,0,0,0
288 DATA 0,0,0,0,0,0,0,0,0,0
290 DATA 0,0,0,0,0,0,0,0,0,0
292 DATA 0,0,0,0,0,0,0,0,0,0
294 DATA 0,0,0,0,0,0,0,0,0,0
296 DATA 0,0,0,0,0,0,0,0,0,0
298 DATA 0,0,0,0,0,0,0,0,0,0
300 DATA 0
Edited by Ed Minchau
Link to comment
Share on other sites

  • 0
Posted (edited)

So, for a parametric non-cheating method, 1833 jiffies for J=500 and 1784 jiffies for J=256.  Just change the value in J to change the vertical scale resolution.  Below J=100 or so it starts looking like a ziggurat.

1 TI=.:DIMX,I,Y,E,C:M=150:N=80:J=256:DIM M(255),D(141),F(J)
2 A=.7071:G=-J/11.2035:L=-0.001*G:SCREEN$80
3 FORI=1TO141:X=I-70:D(I)=X*X:NEXT
4 FORI=0TOJ:F(I)=N*EXP(I/G):NEXT
5 FORO=1TOMSTEP2.5:E=A*O:I=O-70:C=I*I:FORI=1TO141:X=I+E
6 Y=E+F(L*(C+D(I))):IFY>M(X)THENM(X)=Y:PSETX,M-Y,5
7 NEXT:NEXT:PRINTTI:END
	

Interestingly, TI does not scale linearly with J.  If J=505, TI=1821. If J=420, TI=1801, almost exactly 30 seconds, but at J=419 and J=421 it's 1808.  And at J=319, it's at 1769, breaking the top speed of the cheat.

Edited by Ed Minchau
Link to comment
Share on other sites

  • 0

btw. Since Code is already quite fast, changing line 6 last value 5 (Vera drawing colour) to something more fun like variable E, presents more colourful execution 😉

  • Like 1
Link to comment
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Answer this question...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

 Share

×
×
  • Create New...

Important Information

Please review our Terms of Use