Jump to content

Snickers11001001

Members
  • Posts

    140
  • Joined

  • Last visited

  • Days Won

    6

Snickers11001001 last won the day on January 7

Snickers11001001 had the most liked content!

Recent Profile Visitors

The recent visitors block is disabled and is not being shown to other users.

Snickers11001001's Achievements

Rookie

Rookie (2/14)

Conversation Starter Dedicated First Post Collaborator Rare One Year In

Recent Badges

153

Reputation

2

Community Answers

  1. I feel like I need to update the couple programs I updated. But being away from the forums during all the flurry of recent updates, I seem to have missed a lot. So far, I know SCREEN has changed in terms of what modes are available and how they default. Also, TI$ seems to be broken or changed in behavior. I can't get TI$ to print anything other than "000000" -- whereas the intended behavior has thus bar been to print elapsed time in HHMMSS format. Anyone know what's up with that. Beyond these, has anyone done a primer or summary of things that need to be looked at and updated for bringing existing BASIC code up to scratch for the updated ROM version?
  2. Ed, Completely unnecessary as the program stands now.... you're right! I think that was probably a leftover from playing with an alternative way of doing the IF/THEN branch I had toyed with briefly and I just never pulled it back out; and then got it stuck in my head that it was something I had done for a speedup. I doubt it makes more than a few jiffies difference either way as the code stands now. .
  3. Opaque, Try this... 0 ti$="000000" 1 dimx,i,y,e,c:l=-0.001:m=150:n=80:dim m(255),d(141):graphic1,1:a=cos(.785) 2 fori=1to141:x=i-70:d(i)=x*x:next 3 foro=1tomstep5:e=a*o:i=o-70:c=i*i:fori=1to141:x=i+e:y=e+n*exp(l*(c+d(i))) 4 ify>=m(x)thenm(x)=y:draw1,x,abs(y-m):next:next:goto6 5 next:next 6 x=ti:char1,.,.,str$(x):end You got an error with the TI=. because on the original Commodore machines you reset BOTH the TI and TI$ system variables by setting the string TI$ to "000000" (on the X16 you can reset either one separately); also you'll see that I used a 'CHAR' command (which is how the 128 prints text to the bitmap) in order to display the elapsed jiffies. On my copy of VICE128 (older version) it plots (the step 5 version) at 16613 jiffies with my improvements. The original c128 version from the youtube video takes almost 24000 jiffies. The C128 does not enjoy quite as much of a speed increase percentage wise compared to optimizing on the X16, which I would attribute to (a) the fact that the C128's graphics drawing commands are slower than the X16's version, since the 128 performs some bounds checking and has to compute a pixel location in memory using a slice of a tile approach which is how C128 stores bitmaps in memory; and (b) the C128 has a slower BASIC parsing engine compared to the X16 which uses a faster parser from the C64. One the 128, BASIC has access to more than 64K of ram, but therefore has to do some banking in its parser; whereas the x16 (like the c64) is limited to much less memory but does not waste cycles every single character parse in BASiC dealing with banking operations. Also I think desertfish has it right: Since the 6502 floating point libraries in the Commodore machines are decent already, you would probably just call them from within any machine code implementation of this. So your further time savings in going to assembler would only be in eliminating BASIC parser and variable handling routines. You'd get more speed, but not as much as you might hope since the maths will still have to grind like they do now.
  4. Its not the "drawing speed" per se -- VERA is extremely fast, and the BASIC PSET command is already a pretty fast pipe into VERA's api. No, its the math, parsing, and variable stores/fetches that are the speed road blocks here. The "EXP" function is especially nasty. Implementing a polynomial expansion in 6502 code is a tall order, and that's a large part of what's going on when BASIC evaluates the inverse log function. Still, there's always room in BASIC for some speedups. I took your code and added a command to reset the "TI" system timer variable at the very beginning, and to print elapsed TI (i.e., jiffies) right before the END command. Result, 4289 jiffies on R38. (I'm just back on this forum from being away for a while and haven't yet had time to play with any of the new releases... lots of new developments, though... quite exciting!). Anyway, just I played with it for about an hour and came up with the below. This revision does the exact same thing as your program, but takes only 2804 jiffies in R38... an approx. 35.5% time savings. As with all BASIC optimizations, its now obtuse, and opaque, and borderline unreadable. But that's the cost for making BASIC faster, alas. 1 TI=.:DIMX,I,Y,E,C:L=-0.001:M=150:N=80:DIM M(255),D(141):SCREEN$80:A=COS(.785) 2 FORI=1TO141:X=I-70:D(I)=X*X:NEXT 3 FORO=1TOMSTEP2.5:E=A*O:I=O-70:C=I*I:FORI=1TO141:X=I+E 4 Y=E+N*EXP(L*(C+D(I))):IFY>=M(X)THENM(X)=Y:PSETX,ABS(Y-M),5:NEXT:NEXT:GOTO6 5 NEXT:NEXT 6 PRINTTI:END See my thread in HOWTOs for an idea of what's going on here, but by and large what I did was eliminate intermediate variables, kicked that multiplication that calculates the square of 'D" which was done over and over (but with exactly the same results) within the inner loop into an initialization procedure that just keeps the figures in an array, initialized variables by order of use (focusing on the inner loop) and did various things to streamline the code for the BASIC parsing engine. Let me know if you have questions about anything in particular and I'd be happy to try and explain.
  5. FRACTAL BOOGALOO, continued... Sorry for the delay, but I suddenly have a lot more personal things going on unexpectedly. I will write this up with some detail later, but it will most probably be several weeks at the very least. Since I have a few minutes now, I wanted to at least put up the listing for what I was able to do to further optimize this one: 1 TI$="000000":GOSUB7::REM [-......-] 2 SCREEN.:Q=$9F20:FORI=.TO5:READX,Y:POKEQ+X,Y:NEXT:FORV=MTOM+S*239STEPS 3 FORH=TTOUSTEPS:X=.:Y=.:FORI=.TON:Q=X*Y:X=H+X*X-Y*Y:Y=V+Q+Q:NEXT 4 FORI=.TOF:Q=X*X:R=Y*Y:IFQ+R<LTHENY=V+2*X*Y:X=H+Q-R:NEXT:POKEO,.:NEXTH,V:GOTO6 5 POKEO,I:NEXTH:NEXT 6 A$=TI$+"":FORI=1TO6:POKE$819+I,ASC(MID$(A$,I,1)):NEXT:END 7 DIMX,Y,Q,R,V,H,L,I,O,F,N,S,T,U,M:L=4:S=1.125E-7:F=255:N=99:T=-.747345:O=$9F23 8 U=T+S*319:M=.08784:RETURN 9 DATA13,7,15,32,9,17,.,.,1,64,2,16 The original code from Matt's thread takes 15 hours, 8 minutes, 49 seconds (i.e., 54,529 seconds) to put up the full fractal. After my changes, the program completes the full plot in 9 hours, 44 minutes, and 15 seconds. (I.e, 35,055 seconds). All testing using the official release emulator and R38 rom. Anyway, my updates result in what is still a very long plotting time, but accomplish an approximately 35.7% time savings compared to the original. That's nothing to sneeze at. I will get around to writing this up, but in the interim ask any questions and I will get to those with responses when I can.
  6. I seem to remember that The old C64 'type-in' word processor called Speedscript had something neat in the source code, which it treated as akin to the "ON GOTO" in BASIC, only implemented in assembler. It used a command table to manage a count (e.g., found the key pressed on the 5th try so '5' is the command), to get addresses for the destination routines from another table, and stuck the necessary stuff on the stack and then did the RTS to simulate a return from a JSR.
  7. Fractal Boogaloo writeup: A. Cogitating and understanding. NOTE: I was going to stick in an aside about floating point precision here, but I think it will fit better later in the write up when I hopefully get to the end of the optimization process and show some screen shots. For now, lets start instead by considering this description from Wikipedia on programming the Mandelbrot... And here is the listing of Matt's program... notice that he's implemented the 'escape' mechanism just as described in the article: 10 SCREEN 0 30 POKE $9F2D,$07 40 POKE $9F2F,$20 50 POKE $9F29,$11 100 FOR PY=0 TO 239 110 FOR PX=0 TO 319 120 XZ = PX*0.000036/320-0.747345 130 YZ = PY*0.000027/240+0.08784 140 X = 0 150 Y = 0 160 FOR I=0 TO 355 170 IF X*X+Y*Y > 4 THEN GOTO 215 180 XT = X*X - Y*Y + XZ 190 Y = 2*X*Y + YZ 200 X = XT 210 NEXT I 215 I = I - 100 216 IF I=256 THEN I=0 217 B = 0 218 OS = $4000 219 Y = PY 220 IF Y < 153 THEN GOTO 230 221 IF Y = 153 AND PX < 192 THEN GOTO 230 222 B = 1 223 OS = -192 224 Y = PY-153 230 VPOKE B,OS+Y*320+PX,I 240 NEXT PX 260 NEXT PY The basic structure of the program is: [initialization] [rows_loop (pixels_loop {colors_loop} -output-)] Initialization. Lines 10-50. Its pokes at the VERA registers, setting up the screen. No further comment/analysis necessary. Outer Loop: ('rows_loop') Encompasses everything from line 100 to the end. Indexed by 'PY' in the original code, this is basically the plotting of all the rows of pixels. A single iteration of the rows_loop performs everything necessary to drop a complete row of pixels onto the screen. The complete progression of all iterations of the rows_loop plots all rows of the plot. Two loops are nestled within this main loop. Nestled Middle Loop ('pixels_loop'): Indexed by 'PX" in the original code, and comprising lines 110 to 240, this goes through each pixel location within a row (moving left to right), figures out its color, and includes the output stage that actually puts a pixel on the screen. A single iteration of the pixels_loop results in one pixel getting plotted within the then-current horizontal row. A complete run of all iterations of the pixels_loop puts up all the pixels in a single row. The colors_loop is nestled within. Nestled Inner-most Loop : ('colors_loop'): Indexed by 'I' in the original code, lines 160 to 210. This is the code that grinds out the color selection for a pixel and uses the fractal math and that 'escape' function. Each iteration updates x and y and checks to see if x^2 + y^2 exceeds a threshold of 4. If it exceeds this escape threshold, program execution jumps out of the loop with the iteration count at that point (i.e., the then value of the 'I' indexing variable) available for the output stage of the pixels_loop to use to derive the color of the pixel. If the colors_loop gets through all 356 iterations (0 to 355, both inclusive) without triggering the escape condition, execution falls into the pixels_loop which in turn forces a black pixel. Output: The pixels_loop has the output code. At that point, the nestled colors_loop has either completed a full run or escaped early. The output stage tweaks the value of the colors indexing variable "I" so its within the valid range of the 0 to 255 color VERA palette (with an if/then branch at line 216 above to force 'I' to 0 (black) if the 'colors' loop completed its entire run without an escape... Remember the only time 'I' would get there with a value of 256 is if the colors_loop exited after a full run: The 'NEXT' in the final iteration adds 1 to 355, execution falls through without another iteration when the 'NEXT' command determines the result, 356, is greater than the specified endpoint of the loop; then line 215 subtracts 100 leaving 256). Based on the current pixel coordinates, the output stage uses a series of branches in lines 220 and 221 to execute or jump over adjustments to the VPOKE 'bank' parameter and an offset value. It then executes the VPOKE with some included maths to get the pixel of the selected color plotted at the right screen coordinate. At 320x240 resolution, the program puts up 76,800 pixels. For each pixel, the inner-most colors_loop iterates up to 356 times (0 to 355 both inclusive). Matt tells us the code takes at least 100 iterations per pixel. The program code above ALSO tells us that, since line 215 above subtracts 100 from the value of 'I' before eventually VPOKING it and the X16 would error out with "ILLEGAL QUANTITY" if the number being poked were negative. I don't know if the fractal 'escape' scenario follows a standard distribution, but let's assume so. That would mean (on average) about 178 iterations per pixel, which is to say the code in the colors_loop gets parsed and executed by the BASIC interpreter in the neighborhood of 13.67 million times to produce the full image..... By the way, it takes the X16 over 38 minutes to run a completely empty FOR/NEXT loop that many times. (!!!) But what about the 'magic' numbers in that code? What's with those calculations "XZ = PX*0.000036/320-0.747345" and "YZ=PY*0.000027/240+0.08784" ?? Here's what 's what: The "-.747345" and ".08784" are the "starting points" selected by Matt in the fractal x,y domain (which I think runs from -2 to 1 in the x/real axis, and from -1.5 to 1.5 or so in the y/imaginary axis). The .000036 and .000027 are the size of the range being examined on each axis. The 320 and 240 are the number of 'points' he wants to sample within the respective ranges (i.e., a number of points corresponding to the screen resolution). So in the 'x' range, he wants to start at the number -.747345 and go from that number to an end point that is only .000036 larger, grabbing 320 evenly spaced numbers in that range. Same for the 'y' range, except starting at .08784 and going to an end point .000027 away, grabbing 240 numbers in that range. You might notice that 0.000036/320 and 0.000027/240 BOTH evaluate to 0.0000001125 (or 1.125E-07, as it will display on your X16). That's how zoomed in this is, i.e., it takes that itsy bitsy increment added to the x,y to create a new starting point before running the colors_loop to see if /when the escape condition is met. If the 'full' Mandelbrot were mapped to the size of a football field, the width of the piece Matt's program is mapping onto the screen would be a teeny tiny spot on that football field (i.e., fractions of a millimetre!) Pretty impressive that a BASIC derived from 1980s Commodore/Microsoft code has enough floating point precision to handle things at this level of zoom. The thing that's striking here is the math is not actually all that complex. There aren't any transcendental functions, no use of trigonometric specialty commands (although the escape threshold is, as a practical matter, checking to see if a particular x,y point in the fractal domain wanders out of a circle of radius 2 during accumulation of changes caused by the math of those inner-loop iterations). There aren't even any square-roots, and the only exponentiation is a simple power of two. But its doing the short set of simple operations over, and over, and over and over, and OVER. Also look at the calculations that adjust 'x' and 'y' in each iteration of that colors_loop. The result of both x and y at the end of each iteration will have been affected by BOTH the values of x and y at the beginning of the iteration. There's no partial results (except the very first iteration) that can be pre-calculated and reused. While it is true that a Mandelbrot is mirrored in the y axis, that only leads to an optimization where the starting point and range go through the inflection point, which is not the case here. That makes speeding up the most cycle-intensive part of the program hard, especially in BASIC. As Matt put it in his thread, "The core driver of the time is the math that just needs to happen." B. OPTIMIZATIONS ALREADY NOTED/PERFORMED BY OTHERS. 1. Evicting calculations. Scott's thread covers some of these. For example, one of his early moves was to kick the divisions out of the loops and into to the initialization (.000036/320 and .000027/240 ALWAYS yield the same answers and do not need to be evaluated 76,800 times in the pixels_loop). One of his updates kicked the entire derivation of YZ out of the pixels_loop and put it in the rows loop since that expression only changes as the row number being worked on changes. And by the end of his optimization process, he reached the point where he simply added the result of the divisions (the .00036/320 and .000027/240) directly to the 'starting points' as adjusted by all prior increments. 2. Duplicated operations in the inner-most loop. Matt mentioned and Scott implemented a change based on the fact that both lines 170 and 180 of the original code calculated the squares of 'x' and 'y'. In the original code, line 170 has 4 variable fetches and two multiplications to do this before getting to the evaluation of the IF/THEN condition. Then, line 180 again did the same 4 variable fetches and two multiplications as part of the rest of the stuff there getting calculated. By introducing two other variables at the outset of the loop and setting them to x*x and y*y respectively (two variable stores, four variable fetches and two multiplies), the colors_loop can then simply fetch the results for use in both the IF/THEN and expression in the following line. Net result: 8 variable fetches and 4 multiplies are replaced by 2 variable stores, 6 variable fetches, and only 2 multiplications for that same part of the calculations. As long as the variables are initiated by frequency of use (see my explanation on the prior page of this thread) the savings from getting rid of those two multiplications will be a net benefit. An added benefit of this change noted by Scott is the ability to avoid having to store an intermediate value (held in "xt" variable in the original code). We eliminate a store of 'xt' when it is calculated and the fetch of 'xt' when its value is put into 'x' in the original code, as well as the associated character parses by the BASIC interpreter. 3. Using VERA data port instead of VPOKES.... Matt's original thread linked above noted one of the most interesting optimizations from the X16 perspective: Changing from VPOKES to use of the VERA data port. The original code uses VPOKES, which means that, based on the current row/column pixel coordinates, you need to ascertain the right VERA bank number, and then offset into the memory space of that bank based on what row/column you are plotting. Since the output stage is within the pixels_loop all the work to do that stuff gets performed 76,800 times over the course of a run of the program. But VERA memory can also be filled using one of the VERA data ports. You do that by poking the high/low components of the address you want to start writing to into a VERA register corresponding to one of the two data ports, and the auto increment step into another. Then every time you write a value to the VERA data port, VERA itself moves its internal pointer so the next write to the port puts the value at the next memory address according to the increment selected. No more worrying about VPOKE bank values or offsets from a memory starting point. So for the first 152 rows of the screen, you save [i] 76,800 executions of the stuff in original code lines 217 to 220 (3 variable stores, 2 variable fetches, an IF/THEN with a less than evaluation; and [ii] in row 153 you save all of the stuff just mentioned PLUS another line 221 with its two variable fetches and an IF/THEN with two comparison evaluations; and [iii] in the last 127 pixels of row 153 and and rows 154 to 239 you save everything from 'i' as well as another three variable stores, another variable fetch and a subtraction from original code lines 222 to 224; and [iv] and all the bytes of BASIC parsing from the foregoing. AND if you use a variable to hold the VERA data port address, the POKE you substitute for line 230 will have only two variable fetches, whereas the original VPOKE has three more variable fetches, two additions, and a multiplication that are all run all 76,800 times. Also, as I mentioned in Matt's thread, POKE is itself slightly faster in BASIC than VPOKE, not least of all because POKE parses only two arguments, while VPOKE parses three of them. 4. Variable Initialization. Scott covered this, and you'll remember the post I put in the prior page of this thread about the "nope, nope, nope.." process of how the BASIC on the X16 (and C64, and C= +4 and C128) use regular scalar variables. The idea is to figure out the most used variables in the inner most loops and initialize them in priority of frequency of use, so that those that are used most often are also fastest for BASIC to fetch and store. 5. Pleasing the parser gods. Finally, Scott's thread dealt with removing spaces, crunching lines, single-character variables, using '.' whenever one wants to parse a zero (0) value and the like. Every little bit helps, especially with the stuff in that inner-most loop running millions of times. signing off for now... Ok, that gets us to the end of the ground already covered by Matt and Scott. I believe I have spotted a couple things that should (hopefully?!) improve speeds even further. As I think I mentioned, I'm doing my testing by grinding out a selection of rows (30 rows per run to be precise) from within the image and actually including output and plotting in my tests. I'm doing this because I need a variable to use for the VERA data-port address and I want to account for its impact and initialization priority in the benching. Also, turns out the use of the VERA data port and resulting simple two argument poke will allow me to implement a nifty optimization. Based on testing so far, I think I've got some genuine improvements here, but I want to do a few more tweaks in the next few days to really have something really concrete. Also I think I may have possibly stumbled onto something a little bizarre, but want to dig into it deeper, including by doing some tests on a C64 and Plus4 on a few different emulators to see if what I think is happening is actually a thing with Commodore BASIC that made its way to the X16. Sorry to be cryptic, but it is sort of weird and right now something I have yet to fully understand. Cheers.
  8. I generally think the community is the people, not the window dressing. But... It seems to me the site will survive under either forum software platform IF (and only) the project itself gets off the dime at some reasonable point in the future. A lot of that depends on how much static the X16 ecosystem causes its founder and driver of forward progress (I.e., the 8BitGuy). I sense already he's not a fan of the site and prefers facebook. It seems to me a controversy about a stepdown in functionality might be the sort of thing what would "harsh his vibe" and that could be a bad thing. The internet trolls (and competing projects?) might also seize on any hiccups during a forum software transition as grounds to neg the project itself, and again that could be less than optimal. I think those of us who support this thing ought to endeavor to preserve the status quo at least for an additional year. I can throw in $50. One caveat -- I do not favor any sort of 'badge' or special forum privileges for contributors. If people can contribute $ that's great, but I don't want to see elevated status for donors since contributions to something like this can actually come in many forms, from pure enthusiasm, to coding, to moderating, to coding demos, to making videos, etc. Just my take. Tom, thanks to you and everyone else who stepped up on the site for all your work.
  9. OPTIMIZING A SIMPLE PROGRAM PART TWO: 'FRACTAL BOOGALOO....' This is a continuation of a discussion started in Software Library chat by SlithyMatt when he uploaded his neat BASIC routine that outputs an extreme zoom-in onto a portion of the Mandelbrot fractal, with a 256 color display and outputting in 320x240. That thread is here: https://www.commanderx16.com/forum/index.php?/topic/1773-new-demo-uploaded-fancy-mandelbrot-set-zoomed-plot/ Its a very computationally intensive program, and of course, Matt's code is written to be straightforward and extremely easy to follow -- he's a great tutor/mentor which is how his youtube videos are so popular. But that means he wasn't necessarily aiming for speed hacks. That being the case, other members spit-balled about doing an optimized version in BASIC. I indicated I would be inclined to give it a try, but then real life interrupted me and I'm only just getting around to it. Also (after I started working on this write-up yesterday) I noticed I'm really REALLY late to the game as Scott has done a full thread documenting his work trying to squeeze out the best performance from this neat BASIC routine. His thread can be found here: https://www.commanderx16.com/forum/index.php?/topic/1780-optimizing-basic-fancy-mandelbrot-set-zoomed-plot-calculations/ I highly recommend reading both of those threads before starting with this write-up. And in any event, I haven't done anything yet so it will give you something to do while you wait for my next post. As I noted in Matt's thread, I started by adding some code to Matt's original to capture elapsed time of the original code. Since I'm too lazy to research the correct pokes to restore the VERA to a proper text display after the 320x240 plotting, I used a trick to self-modify the BASIC listing to tuck the elapsed time into a REM statement. Then you reset the X16, use the 'OLD' command to restore the BASIC listing and will find the elapsed time in there. Anyway, it turns out a full plot under the original takes 15 hours, 8 minutes and 49 seconds using ROM version R38 on the official emulator. In the posts that follow we're going to do my same optimization process as earlier in this thread.... First, we'll go through the code to really make sure we GROK (that's old dude Heinlein inspired slang for 'understand intuitively or on a deep level') exactly what the original code is doing. What are the loops? What are the branches and why are they there? What's the set-up like? What's the output stage like? What are things BASIC will do slower than it could? As part of that post, I plan to show you some wonkiness with Commodore style (well, really all) floating point maths precision/rounding implementations that I expect could play into things when the numbers are really really small as with a fractal program. Second, I'll show you what I ultimately come up with in terms of an optimized version and go through my changes and why/how they work. The only guaranty I'll make is that the changed program will be much more difficult to follow and understand when compared to the original. Finally, I'll do a full run on the emulator and see what my version accomplishes when all is said and done. I won't lie, I'm a little doubtful that the empirical results will be anywhere near as good as with the little 'Proteus' demo earlier in this thread. But we'll do our best and recognize that BASIC can only do so much with something that, at its core, grinds this much floating point math. Please do ask questions as we go, which will make the process more interesting and informative for everyone. Stay tuned.
  10. Hey, folks. I haven't had time to play with X16 stuff the last few months for a variety of reasons. Better late than never, I am finally getting around to trying my hand at optimizing this neat little demo. I have Matt's original version running now to see exactly how long it takes for a full run on R38. To benchmark it I added two lines: .... the very FIRST line of the listing.... [code]1 TI$="000000":REM [-......-] [/code] ...and the very last line to be executed after all the plotting is done... [code]500 A$=TI$+"!":FOR I =1TO6:B$=MID$(A$,I,1):POKE$815+I,ASC(B$):NEXT[/code] What's going on here is this added code resets the TI$ system variable at the very start, leaving some specifically formatted space in a REM statement for later; and then when plotting is all done, it grab the current value of TI$, parses it, and pokes the ASC values of the characters representing elapsed time right into the BASIC listing at the location reserved in that REM statement. So, the idea is that you run the program, and when its done you type "RESET" and enter (or do a "CTL-R" on the emulator) to reset the X16 and get the regular text screen back; then you issue an "OLD" command to restore the BASIC listing. After that, when you list the program you will see the elapsed time of the just completed run coded right into that REM statement at the beginning. Anyway, Matt has already made a post listing the obvious stuff in terms of optimization. As he suggests, I'll be changing the output stage to use regular POKES to the VERA data port to take advantage of its auto-increment ability and avoid all the maths calculating offsets, as well as the branches to figure out where things are in the image to select the correct VPOKE bank, etc. That's a big time savings since the output runs 320x240 times (76,800 pixels put on the screen). Also, if I remember correctly, some testing I did last year revealed the regular POKE routine in BASIC takes a bit less time to execute on average than VPOKE. The biggest time savings will come from really chewing on that inner-most loop that iterates over and over to arrive at the color for each pixel, as well as the mid level loop that plots each pixel in a row using the results of that inner most loop for the respective pixel. As Matt points out, the inner-most loop iterates at least 100 times per pixel, and up to 355 times. Aside from avoiding duplication of the x*x and y*y operations, I see some other interesting possibilities. The goal is going to have to be to absolutely reduce the number of BASIC operations. There's probably not much that can be done with the math. It is what it is. Yeah, I'll optimize the order of variables initiation and knock everything down to single character variables where possible. But at least from my first glance, it seems the heavy lifting will be wonky C64 BASIC stuff related to the inner most loops, rather than any sort of math shortcuts. For now, I think I'll test my changes by running two limited ranges output, say perhaps the first 30 rows and 30 rows in the middle staring at row 100. Something like that. I'll let everyone know how it goes. EDITED: OK, the original code takes 15 hours, 8 minutes and 49 seconds to do a full run on R38. Wowzers. This is less time than you might come up with just running a small section and extrapolating. The reason is that while the inner most loop that accounts for most of the execution time is a FOR/NEXT initiated to run from 0 to 355, in many (most?) cases its never reaching the end of the loop. For example, any pixels that appear in the original C64 colors and grey-scale ranges at the very beginning of the VERA color palette are falling out of the loop very early in its possible range of iterations. We know from Matt's write-up that it always takes at least 100 iterations, but those colors occur when the plotting threshold is met within 32 additional iterations after the first 100 are done. So on average that loop is not going to make it all the way to 355 most of the time. All that said, I'm going to do my optimization write up at the end of my original BASIC optimizing thread in the "HOWTOs" section to avoid mucking up this thread with things that aren't necessarily questions for the original author. The work on this program will start a couple posts down on page 2 of the thread, which can be found here: https://www.commanderx16.com/forum/index.php?/topic/1488-basic-convertingoptimizing-a-simple-basic-program-from-another-commodore-platform-to-the-x16/page/2/#comments
  11. Wow, that's actually quite mature already. Is there a 'howto' to try it out on the X16 emulator? Is it a custom rom or something?
  12. Love, love love these projects. If I can posit any advice from the cheap seats (i.e., from someone like me who absolutely has no ability to make such a thing, but who is a user of BASIC and enjoys it), it would be this: There are things that BASIC needs but not everything from 'gen 3' basics like visual basic need to be included or are feasible. Remember at bottom this platform is still an 8 bit machine with no prefetch, no branch prediction, no modern pipelining, no cache memory, almost no maths, and a 16 bit address space. Its running at 8mhz, maybe, unless they (a) can't get it to work at that speed, in which case it will be a 4mhz machine, or (b) if they release the faster X8 FPGA version (that won't have the banked memory, ZP space, or space in the pages below $0800 that the X16 has) which probably won't be able to fit all your features anyway. Just take a look at the discussion near the end of the "BASIC 2 vs BASIC 7" thread and the impact of just a few more instructions in the byte fetching/parsing core between the C64 and the later Plus/4 and C128 in terms of the negative impact on performance for the later machines with better BASICs. If you're having to bank-switch, for example, surely it takes a hit. Tokenizing a lot of stuff inline (e.g., constants, jumps, variable memory locations) is a great idea, but I suggest a simple escape code structure using byte codes. Parser finds, say petscii code for '@' not inside quotes and it knows the next two bytes are a small-int in 16 bit signed format; it finds petscii code for [english pound] (which looks a bit like a mutant F), it knows the next 5 bytes are the exponent and mantisa for a float; it finds token for 'goto' or 'gosub' it knows the next two bytes are the actual 6 bit address for the destination of the jump in the code, instead of the petscii numeric representation of a line number; it finds petscii code for "%" it knows the next two bytes are the 16 bit address to the value followed by the name of an int style variable in memory. (At execution it just needs to fetch the value, during LIST operation it grabs the variable name at that address+2 until it hits the terminator). Yeah, OK, the modern way to do many of these things would be with a hash table, but I caution you to consider the performance impact on an 8 bit machine. If you use the idea of inlining 16 bit addresses for jump locations to speed up execution, of course, then there are other issues. With line numbers, your "LIST" routine needs only follow the 16 bit address and then grab the line number at that address and put it on the screen during a 'LIST"; but with LABLES, you will need to set up a data structure (probably a linked list) that can be consulted by the interpreter during code LIST operations to regurgitate the labels or line numbers when the user lists the code and that metadata has to get saved with the program. That's actually a better place to use banked memory... the performance cost of swapping banks is not as important when listing the code. I don't think its feasible to tokenize at runtime, it needs to be as you enter things.
  13. Check out the links provided they really do help. RND takes one of three parameters. RND(1) [or any positive parameter, generally] generates a random number between 0 and 1 based on the next 'seed' -- an internal number that is behind the maths of this pseudo-random number generator) -- and updates the seed. RND(0) does the same thing but uses a byte of the internal clock as the seed. RND with a negative number [i.e., RND(-2.39997)] uses the supplied parameter as, in essence, the seed. Which means calling it with the same neg. number gives the same random number between 0 and 1. So A=RND(1) is the basic function. But to get what you want, you have to do something with that random decimal value it spits out, which again is generated as a number greater than 0 and less than 1. If you want an integer (no decimal fraction) number from 0 to 9 then reason it out: You need to us the INT() function too. And you take that decimal number between 0 and 1 and multiply it by the highest number you want + 1 (assuming you want the entire range of 0 to the highest number) R=INT(RND(1)*10) will cause the variable 'R' to have a random number from 0 to 9. (Think about it, even if the RND() function itself returns .99999999 multiplying it by 10 gets you 9.999999 and then taking the INT() function of that returns a 9. About the INT() function. Its not 'rounding' in the sense you might be used to, i.e., in the event the fractional component is greater than .5 it does not 'round up' Instead, INT() is a 'floor' style rounding, which means the output is always LESS than the input. With a positive number, say INT(9.9999), it returns the next integer that is LESS than the value supplied, which is to say it disregards the decimal fraction, i.e., it returns 9 in this example. But with a negative number, say INT(-10.999), it ALSO returns the next integer that is LESS than the value supplied, i.e., -11 here. All of that is why INT (RND(1)*10) gives you a value between 0 and 9 So, if you want it to return two random single digit numbers, let's say you want to put them in variables to use later you need two calls of this function. 10 N1=INT(RND(1)*10) 20 N2=INT(RND(1)*10) 30 PRINT N1, N2 EDIT: Just read more carefully and saw your thing about wanting either a 1 or 2... that means you multiply by the DIFFERENCE between the lowest number and highest number you want +1. To have a lower boundary you just add it as the base to your calculations, i.e, N1 = 1+INT(RND(1)*2) N2 = 1+INT(RND(1)*2) To test these, try this FOR/NEXT loop from the command line: FOR L=1 TO 10: PRINT 1+INT(RND(1)*2):NEXT After it prints its sequence of 1s and 2s, scroll your cursor up and hit enter on the same command again and watch the sequence change. Viola. Pseudo randoms of either 1 or 2. EDIT: Geez, I was pretty sloppy in answering this... so I went in and fixed some typos and added clarity. Substance is the same, it just reads better.
  14. I'm glad you're still working on this. I don't care what happens with the X8/X16 thing/release. I will play this game on the emulator you specify if that's all that's available. Looks cool and its still really neat to watch someone move a project along like this in sorta real time.
×
×
  • Create New...

Important Information

Please review our Terms of Use