Jump to content

Scott Robison

Members
  • Content Count

    23
  • Joined

  • Last visited

Community Reputation

15 Good
  1. Yeah, the membrane keyboard wasn't the nicest thing in the world to use. My only other computer experience up to that point had been several PET 4032 (I think) at my school, which I thought felt great. Then the TS 1000 which wasn't all bad, and it was a gift, so I hate to sound ungrateful. Then I bought my C=64.
  2. I came along a few years after you did, but I have "professional" or "extended amateur" experience with many of the same languages: BASIC, Pascal, Fortran, C, C++, Modula 2, and C#. As for machine language I started with 6502 variants, and x86/x64. My favorite programming class I ever took was actually FORTRAN 77 (where I was later a TA), because the instructor taught structured programming in an inherently unstructured language. Up to that point all I knew was how to learn the syntax of a language, but that class taught me to think of programming more formally, so that I could write "structured code" even when writing something in assembly language (if I maintained discipline). Also a one semester class that exposed us to about a dozen languages to help us compare and contrast what they could do. Welcome aboard!
  3. You've got it generally how I see it. I've never used smalltalk, but my thought was the LIST command without args would actually give you a dictionary of the fragments / functions / subroutines / whatever. LIST name would list the named fragment. EDIT name would pull up a full screen that would allow the named fragment to be edited, and when exiting it would do the bookkeeping to generate the p-code from the listable text. Floats usually aren't necessary, except for when they are. They have their place, but it shouldn't be the default data type if possible.
  4. Perhaps this is the best way to do it. My thought of having the crunching routine do more work to create a more optimized executable form of the program was the spread out the work of "compiling" to one or a few lines at a time. If someone attempted to write a 20 KB program (lets say average line length of 40 characters, so 500 lines of BASIC) the time to prepare to run would be more obvious than if it was spread out over 500 individual presses of the enter key. Using a completely made up number, if it takes 10 ms per line to crunch the program into its executable form, that would require 5 seconds from the time one typed RUN until it was actually running. 10 ms is probably far longer than it would actually take ... just thinking out loud. But we're dealing with a machine that won't have the exact same constraints as a C64. More ROM space. More banked RAM. Faster storage and faster CPU clock. Maybe all those add up to a model where the complexity of this concept isn't as valuable as it would have been 30 to 40 years ago. Agreed. My thought was to have every reference of a variable add it to a table if it isn't already present, so that crunching ensures the variable exists. Ditto with labels. If you have GOTO label, where the label hasn't yet been defined, you at least get a placeholder for the eventual label value. Each variable might include a ref count so that when a line with a ref is deleted, and the ref count hits zero, the space can be reclaimed. When I designed a scripting language for PCBoard, my initial plan had been: 1, edit a script text file; 2, load the script which would compile it to tokenized form at load time; 3, execute the transient token stream. This was running on 16 bit DOS and the load / compile / execute time was slow enough that I switched to a pre-compiled form, but we didn't attempt to have a script editor built into the BBS. My thoughts are basically how to merge the two ideas so that "compile" was done gradually over the period of the program being written. That's essentially what I had in mind. Tokens would either be constants, variables, or executable (operators, functions, statements). I was thinking it would be stack based so that the statement actually is the last token so multiple statements are easily "crunched" into one sequence of tokens without a need to have extra delimiters / markers.
  5. Preface: I'm not trying to teach anyone anything, I'm just trying to commit some thoughts to writing and soliciting feedback. On the one hand we have Commodore / Microsoft BASIC. It tokenizes keywords into one or two byte tokens, but otherwise the line is stored in memory as typed. Execution is slow, but it is easy to list for the programmer to read, and the interpreter reparses the line at run time for every execution. At the other extreme we have machine language which is very hard to read. Assembly makes it a little easier, but it is still difficult to read. In between we have various languages that can be compiled to some form of code, either machine code or p-code, but there is a "painful" compile / run / test cycle. I think a more elaborate BASIC interpreter could improve on the execution speed of traditional BASIC by doing more work while crunching, trading a little less efficiency as the programmer is writing the code for improved efficiency after typing run. This BASIC would still allow a highly interactive experience for the programmer, without a time consuming edit / compile / link / run cycle. As an example, consider a line of BASIC: 10 PRINT 200+300*500+700 BASIC crunches it down to the following bytes (in hex): 17 08 (pointer to next line) 0A 00 (line number 10) 99 20 (PRINT token followed by a space) 32 30 30 (digits 2 0 0) AA (add token) 33 30 30 (digits 3 0 0) AC (multiply token) 35 30 30 (digits 5 0 0) AA (add token) 37 30 30 (digits 7 0 0) 00 (end of line) To execute that after typing run, BASIC has to read 22 bytes. First it notes that the keyword is PRINT. Skip the whitespace. Convert the digits 200 to a floating point number. Push it on the stack. Read the add operator which means there has to be another expression to the right. Convert the digits 300 to a floating point number. Push it on the stack. Read the multiply operator which means there is another expression to the right, and multiply has higher precedence than add. Convert the digits 500 to a floating point number. Read the add operator which means there is another expression to the right, and add has lower precedence than multiply, so finish the multiply by popping 300 and 500 from the stack, multiplying them, and pushing the result (150000) back on the stack. Convert the digits 700 to a floating point number. Push it on the stack. Read the end of line marker. Pop 150000 and 700, add them, and push the result (150700) back on the stack. Pop 200 and 150700, add them, and push the result (150900) back on the stack. Now we have a single expression, so print it. Imagine an alternative implementation that crunches the line to bytes as follows: xx yy (pointer to next line; the exact value doesn't matter at the moment) 0A 00 (line number 10) 10 (length of "listable" crunched line in bytes [16]) 99 20 (PRINT token followed by a space) 01 C8 (literal byte value 200) AA (add token) 02 2C 01 (literal word value 300) AC (multiply token) 02 F4 01 (literal word value 500) AA (add token) 02 BC 02 (literal word value 700) 08 (length of "executable" crunched line in bytes) 02 (offset of literal byte value 200) 05 (offset of literal word value 300) 09 (offset of literal word value 500) AC (multiply) AA (add) 0D (offset of literal word value 700) AA (add) 99 (PRINT) Listing the code becomes more complex (slower) because there is more "uncrunching" to do. Entering a line becomes more complex (slower) because there is more "crunching" to do. Running that line of code has to read 25 bytes instead of 22 bytes, but it doesn't have to convert strings to numbers which results in much less machine code being executed. In my imaginary code above I'm using bytes and words to store the literal numbers, but we could store them in another larger but still preprocessed format (such as floating point) that is much faster for the interpreter to process at run time, rather than continually converting text to numbers. Of course, this benefit can be achieved in large part by storing numeric constants in variables which only have to be converted once, and people do that already in BASIC when they are trying to optimize their code. I'm not suggesting this is the exact alternative format that should be used for INC BASIC. Some of my thoughts include: 1. A full screen editor to edit blocks of code by name, rather than requiring line numbers. 2. Crunching a line would identify all the "tokens" in the line of text and store them in a table that includes variables, constants, labels. In this way variable creation would be part of editing the code, rather than a step that takes place at run time as variables are encountered for the first time. 3. Constant expressions could be evaluated at crunch time. 4. Labels are basically just constant expressions, so there would not need to be any slow linked list search of where to goto or gosub next. 5. Inline assembly for small speed critical parts would be nice to have. 6. Support more than just floating point expressions, since byte math is faster than word math is faster than floating point math. In essence, the full screen editor would "compile" the text into a tokenized form, updating data structures as it went so that when it came time to run the program, all it had to do is reset variables to zero / default values. I welcome feedback. If you think it is the worst idea in the history of ideas, that's fine, I'm just thinking it could be a nice middle ground between existing BASIC and the machine language monitor, especially if it could be located in a ROM bank. The way to make code faster is to execute less of it, and I think something like this is at least an interesting thought experiment.
  6. I did use geoWrite for some tasks when I wanted to "pretty print" a document (though given the available dot matrix printers I had access to, it wasn't really very pretty based on modern standards). Having an 8-bit pseudo-Macintosh was kind of cool from a nerd / geek perspective...
  7. I used SpeedScript from Compute's Gazette for my C=64, later with some updates to use the 80 column C=128 functionality in C=64 mode, for many years. Even wrote my high school graduation speech with it!
  8. Welcome from another newcomer.
  9. The story I linked to above, written by a one time WordPerfect executive, basically credits WordStar 2000 with their eventual domination of the word processing market. Until that point a large number of people stuck with WordStar, warts and all, because it was what they knew. Once WordStar 2000 came out, completely different, people didn't have incentive to stick with it any more.
  10. My preferred pretty printing format at this point is Asciidoctor for similar reasons. I like being able to use an arbitrary text editor then convert it to HTML or PDF if needed (such as for my resume).
  11. Understood about detokenizing the tokenized code. That's how v2 BASIC already handles it. Also understood about reducing the amount of RAM, just contemplating ways to give people more efficient BASIC (or some other interpreted language) that is comfortable for people who don't have the desire or mindset to go assembly. It could only work with the idea of multiple banks of RAM (or with very simple programs that don't require much space).
  12. WordPerfect started out life as a Data General minicomputer application, and was ported to / adapted for many platforms over the years, including the Apple II. It was never a CP/M application. WordPerfect became a major player in much the same way as Microsoft did with DOS. They provided the application under contract to third parties, but retained ownership that allowed them to sell it to other people / port it to other platforms / etc. The Wikipedia article lists some of the platforms supported at https://en.wikipedia.org/wiki/WordPerfect#Version_history. It came out for DOS in 1982 but took years to supplant WordStar, which has been ported quickly from CP/M to DOS. I read a free ebook version of the history of WordPerfect (the company, not the software) from one of the early executives. It can be found at http://www.wordplace.com/ap/index.shtml if anyone is interested. I found it an interesting read, but I had friends who worked for WordPerfect, it started at my university, and I even interviewed at WordPerfect in the late 80s / early 90s.
  13. Please note I'm not trying to denigrate your work at all. I'm just trying to think of ways to: 1. Write a native interpreter; 2. That does a more sophisticated tokenization / crunching process than v2 BASIC; 3. That still keeps around the original form of the source code so that it can be listed and edited; What you've done is great from the perspective of having better tooling to emit BASIC compatible for the platform. My thoughts are more of an intermediary between v2 BASIC and assembly code. Something that could still give the programmer an interactive feeling of immediacy by typing and running their code, but that spends more time optimizing. At this point it is just a thought exercise that I might never have the time to work on, but it is similar in spirit to what I did with PCBoard Programming Language. The source code was similar to BASIC without line numbers, and it went through a "compilation" phase to create tokenized form. So if you wanted a program like: PRINTLN "2+3*4-6", 2+3*4-6 It would generate a tokenized form that looked like: PRINTLN 2 "2+3*4-6" NUL 2 3 4 * + 6 - NUL Where the first token indicated what statement, followed by a count of expressions, followed by postfix expressions terminated with NUL markers. Each of the tokens was just a reference to a variable table (even constants were stored as variables because I was young and inexperienced and it was the first compiler I ever wrote). Then the BBS had a runtime system / VM in it that knew how to parse the token stream. My first thought when tokenizing code by this theoretical BASIC interpreter would be that it parses the line into compact tokens, then stores a copy of the human readable form of the token stream, then a copy of the "optimized" (not the best word) sequence of the tokens. So using the example above, maybe a serialized form of the line looks like (labels are for convenient reference, it really is more or less just an array of tokens in three sub-sections): TOKEN_COUNT = 11 TOKEN_0 = SPACE CHARACTER TOKEN_1 = PRINTLN TOKEN_2 = "2+3*4-6" TOKEN_3 = , TOKEN_4 = 2 TOKEN_5 = + TOKEN_6 = 3 TOKEN_7 = * TOKEN_8 = 4 TOKEN_9 = - TOKEN_10 = 6 PRE_COUNT = 12 PRE_TOKENS = T1 T0 T2 T3 T0 T4 T5 T6 T7 T8 T9 T10 POST_COUNT = 12 POST_TOKENS = T1 T4 T2 NUL T4 T6 T8 T7 T5 T10 T9 NUL This isn't extremely well thought through yet, just stream of consciousness ideas, but it could give one an interactive environment that allows listing and editing of existing statements, while eliminating a significant portion of the runtime cost. The PRE data includes the niceties of original spacing, intuitive expression order with order of operations support. The POST data has already processed the data to a greater extent than v2 BASIC did, so it can more efficiently process the tokenized form. This can never be as good as a real compiler or assembler that discards the original program text after doing the same transformations, but maybe it could be enough of an enhancement to justify larger BASIC tokenized text in exchange for faster speed. Or maybe not.
  14. That's an excellent point about the overhead of reading the extra characters just to skip whitespace. Many probably already realize it, but when evaluating an expression, BASIC has to parse numbers every time. So using this little example: 10 TI = 0 20 FOR I = 1 TO 1000 30 A = 200*300*500*700 40 NEXT I 50 PRINT TI Running that took 1290 jiffies through the loop. Adding a line 15 and modifying line 30: 15 W=200:X=300:Y=500:Z=700 30 A = W*X*Y*Z Running that only takes 716 jiffies (45% faster by not having to parse four digit sequences to floating point numbers each time through the loop). I think there are many ways to approach a "better" BASIC interpreter, but the things that will always take more time are interpreting the human readable form of code (such as infix expressions) into something the computer can work with (converting digit sequences, applying operator precedence, and so on). I think it would be interesting to write an interpreter that tokenizes the lines more than just replacing keywords with tokens. Actually pre-evaluate digit strings into the equivalent binary formats. Convert the infix notation to postfix notation so that the interpreter didn't have to redo it every time. Replace constant expressions with their equivalent values. Other optimizations could be done. Those are things a compiler does, but a compiler also (generally, usually) discards the original text of the line afterward as it is of no value to the computer. An interpreter that is intended to be used to both edit and run code would need to keep some information about the original format of the text so that it could be "listed" for humans. Perhaps these ideas belong in a different topic...
×
×
  • Create New...

Important Information

Please review our Terms of Use