Jump to content

Prog8 language and compiler topic


desertfish
 Share

Recommended Posts

7 hours ago, desertfish said:

Maybe start with familiarizing yourself with the classes that make up the AST (see the ast folder)

I think I'll start with the grammar file... just gotta find it.

7 hours ago, desertfish said:

Kotlin + IntelliJ IDEA

I hope it's emacs-friendly (they do have a Kotlin major-mode available). Edit: It is...

wsl-emacs-kotlin.PNG.e3d2b5245e1269f2eb63dc1341288ddf.PNG

7 hours ago, desertfish said:

About the 16 pseudo registers:  Well, they're not (yet) first class citizens in the Prog8 language. However, because Prog8 allows you to define memory-mapped variables they're pretty close to what I wanted anyway.

I was thinking of using them as "use-once and throw away", i.e. used as a mechanism to transfer paramenters to functions, and once in the function routine, the function copies the value into its local static memory reserved for these paramenters. In this manner, the calling program does not need to know the address of the static memory of the paramenters of any function (but needs to know how many parameters and of which type). Passing objects is also possible if used to pass a pointer to the object. Does Prog8 support pointers? 

7 hours ago, desertfish said:

but to support that would require quite hefty changes to the parser and those changes don't really blend well I think with the original language as it should work for the C64

Supporting both the C64 and X16 was a negative point for me. It adds more complexity than is needed especially since they have different design. If we focused just on X16 (or make a fork) things would progress much faster. Trust me, when the X16 comes out, I will never ever touch a C64 again!

Edited by geek504
Link to comment
Share on other sites

12 hours ago, geek504 said:

grammar file

Look for Prog8.g4  -- it's in the parser subproject.  Antlr is used to generate the parser (and we process it into a kotlin based AST later)

12 hours ago, geek504 said:

the calling program does not need to know the address of the static memory of the paramenters of any function

Currently this is "solved" by the code generator that avoids having to copy stuff to intermediary places, the call site directly pokes the required arguments into the static storage of the subroutine's parameters, via appropriate labels in the generated assembly.     Only if the arguments involve evaluating expressions, temporary evaluation stack storage is used before the resulting value is copied into the subroutine's parameters.    See the FunctionCallAsmGen class if you want to study the details of the current implementation.

Maybe the subroutine calling convention can be improved.  The return value is now always processed via the evaluation stack which is not optimal and subroutines cannot be reentrant because they have static allocation of their parameters rather than a new stack frame for each call , but I don't consider the latter a real problem on 6502 machines. Just write a loop instead of recursion.

12 hours ago, geek504 said:

If we focused just on X16 (or make a fork) things would progress much faster

Perhaps - but don't forget that commanderX16 support only arrived a few weeks ago and this project is much much older 🙂     Also the effort to include a second compilation target forced me to improve various internals of the compiler and fix a heap of bugs as well that I discovered by doing so.   I do agree that a specialized fork  is the only way to fine tune it for CommanderX16 for the short term, but I really would like to see the changes made in the main compiler eventually to support multiple targets.    There's the concept of Compilation Targets and Machine Defintion in the compiler that teach it about the properties of the target machine.

The obvious first thing though that I will do is to create a specialized cx16 branch that can be used for developments focusing on the CX16 !

Link to comment
Share on other sites

On 9/30/2020 at 11:07 AM, desertfish said:

Perhaps - but don't forget that commanderX16 support only arrived a few weeks ago and this project is much much older 🙂

Forgot about this detail!

It might take a while to start digging inside Prog8's source code... I'm getting familiarized with Kotlin and ANTLR while fooling around with IntelliJ IDEA... all new tech to me! Being a hacker at heart, I just *have* to write a small interactive calculator using Kotlin and ANTLR to test my newfound knowledge!

Edit: I am seriously consierding ditching Python and C# for Kotlin. It'll be just C, Kotlin, and Assembly (6502, 68000, x86, and ARM) for me!

PROS: JVM, Java libraries, clean code like C# (a tad cleaner than Python),  I prefer { } instead of indentations, statically typed, Android, better job opportunity in the JAVA field?

CONS: JVM (who wants to be dependent on Oracle?), not well integrated with other technology, i.e. less mature, Kotlin/Native still being developed (iOS!)

Edited by geek504
Link to comment
Share on other sites

Released new version: 4.4

I've just released a new version of Prog8. It contains various improvements and bugfixes that you can read about on the release page in Gitub.  Also quite unexpectedly I've managed to optimize the generated code even further and Prog8 is now actually a little bit faster  in some microbenchmarks than optimized  compiled C from cc65 🙂

See the initial post in this thread for download and documentation links.

 

  • Like 2
Link to comment
Share on other sites

Is it just me or am I alone in having this great aversion to LL grammar, and thus an aversion to using ANTLR?

LALR(1) just makes sense to me and is so elegant especially when using EBNF notation.

After checking out most lexers/parsers it seems to me that Lark is the sweetest thing since apple pie!

But I am still convinced to learn Kotlin 😀

Link to comment
Share on other sites

I don't have a  preference for one or the other simply due to the fact that I am not knowledgeable enough about parser generators to know the differences 🙂

How would Prog8's grammar look like in a LALR 1 definition?   (as compared to the current prog8.g4 spec)   Why would it be preferable?

I've chosen Antlr because that looked to be a high performance, well supported, and broadly used parser generator for Java (and thus, Kotlin).  I'm open for alternative better solutions though.   The parser is in its own project, so you could make another one next to it. The biggest effort would be the mapping into the Ast nodes that the rest of the compiler uses. And perhaps dealing with parser errors.

Link to comment
Share on other sites

Here's a sneak preview of my conversion of text-elite so far.  While building it I've been adapting the prog8 compiler itself too to fix bugs and add features to make it possible to implement all this.  Seems like the procedural galaxy generation is all working -- next up the actual market and trading simulation.

image.thumb.png.f91298ed19f51722d64d0c0a2d255ed8.png

Link to comment
Share on other sites

On 9/30/2020 at 10:07 PM, desertfish said:

 

Maybe the subroutine calling convention can be improved.  The return value is now always processed via the evaluation stack which is not optimal and subroutines cannot be reentrant because they have static allocation of their parameters rather than a new stack frame for each call , but I don't consider the latter a real problem on 6502 machines. Just write a loop instead of recursion.

Though not ever use for a reentrant function can be handled by a loop.

However, if there is a specific "reentrant" call option, it could push the current parameters onto a frame stack and pop them when done, so calls that don't need re-entrance don't have to pay the overhead.

Link to comment
Share on other sites

22 hours ago, desertfish said:

I don't have a  preference for one or the other simply due to the fact that I am not knowledgeable enough about parser generators to know the differences 🙂

How would Prog8's grammar look like in a LALR 1 definition?   (as compared to the current prog8.g4 spec)   Why would it be preferable?

I've chosen Antlr because that looked to be a high performance, well supported, and broadly used parser generator for Java (and thus, Kotlin).  I'm open for alternative better solutions though.   The parser is in its own project, so you could make another one next to it. The biggest effort would be the mapping into the Ast nodes that the rest of the compiler uses. And perhaps dealing with parser errors.

LR produces a much more complex state-machine for the parser but is easier to write grammars because we don't have to worry about left recursion and left factoring. LR is also a faster parser than LL but if you were to write the parser yourself, LL parser is preferable and much easier to write, but who does that nowadays?

While there are a few LR parsers for Java, I could not find something specifically for Kotlin, and I didn't spend too much time trying to figure out the Java/Kotlin integration. SINCE your grammar is already done I wouldn't change it and ANTLR is a fine product.

Quote

 

LL or LR?
This question has already been answered much better by someone else, so I'm just quoting his news message in full here:

I hope this doesn't start a war...

First - - Frank, if you see this, don't shoot me.  (My boss is Frank DeRemer, the creator of LALR parsing...)

(I borrowed this summary from Fischer&LeBlanc's "Crafting a Compiler")

  Simplicity    - - LL
  Generality    - - LALR
  Actions       - - LL
  Error repair  - - LL
  Table sizes   - - LL
  Parsing speed - - comparable (me: and tool-dependent)

Simplicity - - LL wins
==========
The workings of an LL parser are much simpler.  And, if you have to
debug a parser, looking at a recursive-descent parser (a common way to
program an LL parser) is much simpler than the tables of a LALR parser.

Generality - - LALR wins
==========
For ease of specification, LALR wins hands down.  The big
difference here between LL and (LA)LR is that in an LL grammar you must
left-factor rules and remove left recursion.

Left factoring is necessary because LL parsing requires selecting an
alternative based on a fixed number of input tokens.

Left recursion is problematic because a lookahead token of a rule is
always in the lookahead token on that same rule.  (Everything in set A
is in set A...)  This causes the rule to recurse forever and ever and
ever and ever...

To see ways to convert LALR grammars to LL grammars, take a look at my
page on it:
  http://www.jguru.com/thetick/articles/lalrtoll.html

Many languages already have LALR grammars available, so you'd have to
translate.  If the language _doesn't_ have a grammar available, then I'd
say it's not really any harder to write a LL grammar from scratch.  (You
just have to be in the right "LL" mindset, which usually involves
watching 8 hours of Dr. Who before writing the grammar...  I actually
prefer LL if you didn't know...)


Actions - - LL wins
=======
In an LL parser you can place actions anywhere you want without
introducing a conflict

Error repair - - LL wins
============
LL parsers have much better context information (they are top-down
parsers) and therefore can help much more in repairing an error, not to
mention reporting errors.

Table sizes - - LL
===========
Assuming you write a table-driven LL parser, its tables are nearly half
the size.  (To be fair, there are ways to optimize LALR tables to make
them smaller, so I think this one washes...)

Parsing speed - comparable (me: and tool-dependent)

--Scott Stanchfield in article <33C1BDB9.FC6D86D3@scruz.net> on comp.lang.java.softwaretools Mon, 07 Jul 1997.

 

Here are two BASIC grammars to compare, "jvm" is from ANTLR's repository.

c64_basic_bnf.txt jvmBasic.g4

Edited by geek504
  • Thanks 1
Link to comment
Share on other sites

2 hours ago, geek504 said:

I could not find something specifically for Kotlin, and I didn't spend too much time trying to figure out the Java/Kotlin integration.

While there seems to be somewhat of a Kotlin target for Antlr, I went with the default Java target and mapped that to Kotlin classes myself. The translation of Antlr's generated AST classes into my own Kotlin AST classes is done mostly via the extension methods principle taken from https://tomassetti.me/parse-tree-abstract-syntax-tree/  

In the end, I need my own custom AST Node classes anyway, so I *think* the current solution is adequate and having a native kotlin output from the parser generator doesn't give much extra benefits.

Link to comment
Share on other sites

I've just released Prog8 4.5 with a bunch of improvements and bugfixes.

This version is required to compile the TextElite game correctly.    Yeah version 1.0 of my text-elite space trader sim conversion is done!  It can be found in the Download section under Games! 🚀 

@SerErris all planets, their descriptions and properties, and trade markets are procedurally generated like in the original

Edited by desertfish
Link to comment
Share on other sites

On 10/9/2020 at 6:57 PM, geek504 said:

After checking out most lexers/parsers it seems to me that Lark is the sweetest thing since apple pie!

So sorry @desertfish for my being severely side-tracked on helping you out with Prog8!

I just had to explore Lark a little bit... which ended up being a lot... I just wrote an Integer BASIC (aka Woz's Game BASIC) parser using Python and Lark in a single file with only 160 lines (36 lines for the BASIC program and 111 lines for the grammar) and it parses without error Woz's BASIC Breakout game! Beautiful syntax tree automatically generated... dare I go on and actually write the 65C02 assembly compiler for it?

LOL! I'm so tempted to do it and check the benchmark against Prog8 et al. 😉

Edited by geek504
Link to comment
Share on other sites

  • 3 weeks later...

I've just released Prog8 5.0

I decided to bump to a new major version because this one fundamentally changed the way arguments are passed to subroutines, and how return values are returned.   It no longer uses the software-eval-stack for that.  This usually results in much smaller code size for programs calling a lot of subroutines (with arguments).

Also the obligatory bugfixes ofcourse.

  • Like 1
Link to comment
Share on other sites

1 hour ago, desertfish said:

It no longer uses the software-eval-stack for that.

Hi! Is this "software-eval-stack" you mention a software-based stack used for mathematical computations based on a RPN-type stack? E.g. 2+3 becomes 2, PUSH, 3, PUSH, +? This is what I am using for my BASIC compiler. If so, it does do a lot of function calling and can be greatly optimized if one bypasses the stack entirely but involves major compiler modifications as you mentioned. This is a sample code from my compiler using macros that greatly improves readability:

Quote


10 REM ASSIGNMENT
20 AX = 3 : REM NEED TO OPTIMIZE
30 V = ABS(AX + 2)
35 Q = 1 + 2 * 6 / 3
40 PRINT "THE END"
50 END

start
  line
    10
    comment    REM ASSIGNMENT
  line
    20
    assignment
      AX
      3
    comment    REM NEED TO OPTIMIZE
  line
    30
    assignment
      V
      abs
        add_exp
          AX
          +
          2
  line
    35
    assignment
      Q
      add_exp
        1
        +
        mul_exp
          mul_exp
            2
            *
            6
          /
          3
  line
    40
    print    "THE END"
  line
    50
    end

Tree('start', [Tree('line', [Token('INT', '10'), Tree('comment', [Token('COMMENT', 'REM ASSIGNMENT')])]), Tree('line', [Token('INT', '20'), Tree('assignment', [Token('VAR_ID', 'AX'), Token('INT', '3')]), Tree('comment', [Token('COMMENT', 'REM NEED TO OPTIMIZE')])]), Tree('line', [Token('INT', '30'), Tree('assignment', [Token('VAR_ID', 'V'), Tree('abs', [Tree('add_exp', [Token('VAR_ID', 'AX'), Token('ADD_OP', '+'), Token('INT', '2')])])])]), Tree('line', [Token('INT', '35'), Tree('assignment', [Token('VAR_ID', 'Q'), Tree('add_exp', [Token('INT', '1'), Token('ADD_OP', '+'), Tree('mul_exp', [Tree('mul_exp', [Token('INT', '2'), Token('MUL_OP', '*'), Token('INT', '6')]), Token('MUL_OP', '/'), Token('INT', '3')])])])]), Tree('line', [Token('INT', '40'), Tree('print', [Token('STRING', '"THE END"')])]), Tree('line', [Token('INT', '50'), Tree('end', [])])])

.include "macros.inc"
.include "header.inc"
.code

L10:        ; REM ASSIGNMENT
L20:        PushInt 3
        PullVar AX
        ; REM NEED TO OPTIMIZE
L30:        PushVar AX
        PushInt 2
        jsr ADD
        jsr ABS
        PullVar V
L35:        PushInt 1
        PushInt 2
        PushInt 6
        jsr UMUL
        PushInt 3
        jsr UDIV
        jsr ADD
        PullVar Q
L40:        LoadAddress S0        ; to r0
        jsr PrString
L50:        rts

S0:        .asciiz "the end"
AX:        .res 2
V:        .res 2
Q:        .res 2

.include "io.asm"
.include "math.asm"
 

As you can see, line 20 does a PUSH and a PULL to/from stack for a simple AX=3. It could have simply copied over the INT 3 directly into VAR AX. I'm planning in writing a post-compiler optimizer much later. Note that I couldn't use VAR A since A is a reserved keyword in ca65!

Link to comment
Share on other sites

Exactly like that, @geek504  the software eval stack of Prog8 is a (low/high byte split) stack indexed by the X register that functions as a second stack pointer.  Expressions like A+B are evaluated like you pointed out.   I've been working on eliminating some of the need for this stack for intermediary results and trivial expressions and assignments, but for most non-trivial expressions it is still required.

It makes the code generator itself a lot simpler however the generated code is quite inefficient in terms of 6502 assembly....

Link to comment
Share on other sites

13 hours ago, desertfish said:

It makes the code generator itself a lot simpler however the generated code is quite inefficient in terms of 6502 assembly....

@desertfish I started coding my compiler without a stack and while I can say it was efficient, it was just too slow (edit: to code and finish the compiler) and prone to bugs... I decided to implement the stack midway just to get the compiler ready and then re-implement the non-stack improvements later if at all. I'm guessing a rough 10-15% speed improvement and am not sure if it is worth the effort right now. 6502 assembly is inefficient by nature (but very simple to implement) especially if we write proper assembly code to preserve A, X, and Y, wasting bytes and cycles with PHA, PHX, PHY, and PLY, PLX, PLA prior to every subroutine. I do feel that this is still more efficient than cc65's C-stack implementation though.

I am not worrying too much about maximum efficiency (I don't think we will ever get close to super tight assembly code) because I hope one day in the future we will be able to crank up the X16's MHz to GHz range!

At least in the emulator scene, we can implement the following 6502 JIT-core to x64 which could bring to a realistic 12GHz!

https://scarybeastsecurity.blogspot.com/2020/04/clocking-6502-to-15ghz.html?m=1

beebjit_15GHz.png.e4ed4ecc5eb06833cbbd73b194b1fa16.png

Edited by geek504
Link to comment
Share on other sites

10 hours ago, geek504 said:

(I don't think we will ever get close to super tight assembly code)

Indeed, especially not with my compiler writing skills 😅

That's why I included the inline-assembly block in prog8 from the start, to make it easy and fairly transparent to write critical pieces of code in tuned assembly

Link to comment
Share on other sites

  • 2 weeks later...

I've just released Prog8 5.1

It contains a few bugfixes (such as making breakpoints in Vice work again), but the bulk of the changes are improvements to the generated assembly code.

Certain array accesses, value assignments and value comparisons have seen optimizations and the result is that the "plasma" benchmark from the millfork-benchmarks project now runs at around the same speed as the optimized CC65 version does.

 

note   I made an error in the setting of the graphics mode in this release. The VIC YSCROLL register ($d011) is not set correctly.  You can fix this in your own code or wait for the next Prog8 version to be released.... Sorry 🙂

Edited by desertfish
bug
  • Like 1
Link to comment
Share on other sites

I've just released Prog8 5.2

It contains a few important bug fixes (some value comparisons were done wrong, for instance) and the silly YSCROLL screen issue for graphics bitmap screen mode on the C64 that crept into the 5.1 release, is also fixed.

Note:  starting with this release,  you now require Java version 11 to run the compiler.

  • Like 1
Link to comment
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

 Share

×
×
  • Create New...

Important Information

Please review our Terms of Use