Correct user typos in a C source file using Perl 6 grammar

Ask Question

Asked 7 years, 8 months ago

Modified 5 years, 11 months ago

Viewed 86 times

To learn about Perl 6 grammars, I created a simple parser for C source files. Note that this is a constructed example motivated by this question. The point is to parse a C source file and correct user typos in argument lists.

#! /usr/bin/env perl6
#
# Correct user typos in a C source file according to the specification
# given in this question:
#
#   https://stackoverflow.com/q/49020238/2173773
#
use v6;

# Define some test cases...
my $str = q:to/END/;
char *what = "Perl6 grammar example (parsing a C source file)";
foo(bar baz);
foo(bar baz, bak);
foo(bar baz, foo(bar baz)); # recursive call to foo(bar baz)
foo( "bar baz");
foo("bar baz" bak);
foo("bar baz" bak "123 (abc d)", 5);
END

class C-simple-actions {
    method string-constant ($/) {
        make ~$/;
    }

    method make-data ($match-data) {
        $match-data.make( [~] $match-data.chunks.map: {$_.value.?made // $_.value} );
    }
    method TOP ($/) {
        self.make-data( $/ );
    }

    method all-text ($/) {
        self.make-data( $/ );
    }

    method func-call ($/) {
        self.make-data( $/ );
    }

    method normal-text ($/) {
        self.make-data( $/ );
    }

    method argument ($/) {
        self.make-data( $/ );
    }
    method white-space ($/) {
        make ~$/;
    }

    method arg-separator ($match-data) {
        my $str = $match-data.Str;
        if $str ~~ /^ \s+ $/ {
            $match-data.make(", ");
        }
        else {
            $match-data.make( $str );
        }
    }
}

grammar C-simple {
    token TOP { <all-text> }
    token all-text { [<normal-text> <func-call>]* <normal-text>?}

    # The 'normal-text' token represents:
    #   1. String constants (these can include internal parenthesis '(...)' that
    #        will not be recognized as 'func-call' tokens, as defined below).
    #        E.g.: char *what = "Perl6 grammar example (parsing a c source file)";
    #        The parenthesis in the previous string constant is then not a
    #        'func-call' token.
    #      Note: string constant in comments will be included in normal-text.
    #      Note: string constants can also occur inside 'func-call' tokens.
    #   2. All other code, except 'func-call' tokens (as defined below)
    #      Note: For this simplified parser we do not distinguish between
    #        code and comments (i.e. // ... or /* ... */ ).
    #        This means that 'func-call' tokens inside comments
    #        will not be included in 'normal-text' tokens as well.    
    token normal-text { [<!before '('> [<string-constant> || .] ]+ }

    # A C-'string-constant' token is (a string) delimited by double quotes and
    # can include internal double quotes when preceded by a backslash escape character.
    # There can be a backslash at the end of the string, if escaped
    # with another backslash, i.e "Hello\\" ...
    token string-constant { '"' ['\"' || '\\' || <-["]>]+ '"' }

    # Note: this is not currently parsing real C code. It assumes that
    # any function argument is separated by a space (user typo) or
    # a comma. But this has the obvious flaw that, e.g. func( 2 + 3 )
    # will be interpreted as a typo and corrected to func(2, +, 3)
    # while it most likely is not a user typo..
    token func-call {
        '(' <white-space>?
        [<argument> <arg-separator>]* <argument>
        <white-space>? ')'
    }

    # Note: allows for recursive function calls inside the argument..
    token argument {
        [<!before <arg-separator>>
         [<string-constant> || <func-call> || <-[()]>] ]+  
    }
    token white-space { \s+ }

    token arg-separator { \s+ || [\s* ',' \s*] }

}

print $str;
my $result = C-simple.parse( $str, actions => C-simple-actions.new);
say "-" x 80;
print $result.made;

Although this parser seems to work according to the specification, I would be interested to know if there is a simpler way to write the action class C-simple-actions of the grammar? Currently, the methods are just populating the change " " --> ", " upwards until it reaches the TOP method. This seems unnecessarily verbose to me, could it be simplified?

edited Nov 27, 2019 at 2:56

rolfl

98.2k17 gold badges220 silver badges419 bronze badges

asked Mar 8, 2018 at 9:58

Håkon Hægland

9716 silver badges14 bronze badges

\$\begingroup\$ How is the performance of this? Last time I used Perl6 it was incredibly slow especially compared to Perl5. \$\endgroup\$

yuri
– yuri

2018-03-08 10:10:50 +00:00
Commented Mar 8, 2018 at 10:10
3

\$\begingroup\$ Hi yuri. I think the performance has improved much the last years. See for example this post. \$\endgroup\$

Håkon Hægland
– Håkon Hægland

2018-03-08 10:15:24 +00:00
Commented Mar 8, 2018 at 10:15
\$\begingroup\$ @yuri Rakudo compiles modules into bytecode when loading them for the first time, significantly improving its performance. \$\endgroup\$

Mimosinnet
– Mimosinnet

2019-09-10 11:22:53 +00:00
Commented Sep 10, 2019 at 11:22

Add a comment |

0 You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.

Stack Exchange Network

Correct user typos in a C source file using Perl 6 grammar

0

You must log in to answer this question.

Hot Network Questions

Correct user typos in a C source file using Perl 6 grammar

0

You must log in to answer this question.

Related

Hot Network Questions