Compilers

NAVIGATION
CATEGORIES
REFERRENCE
LINKS
  • Why LL(1) Parsers do not support left recursion?

    29 answers - 433 bytes - related search similar search Add To My Delicious Add To My Stumble Upon Add To My Google Mark Add To My Facebook Add To My Digg Add To My Reddit

    Hi, first of all, I'm not an expert in the theory of computation.
    I've read about LL(1) parsers and I have seen that they do not support
    left recursion, because it is said that it would lead to infinite
    recursivity.
    My question: why is that? In which case a LL(1) parser can enter
    infinite recursivity?
    I can't find a good example that demonstrates that.
    Thanks!
  • No.1 | | 771 bytes | |

    Hi, first of all, I'm not an expert in the theory of computation.

    I've read about LL(1) parsers and I have seen that they do not support
    left recursion, because it is said that it would lead to infinite
    recursivity.

    My question: why is that? In which case a LL(1) parser can enter
    infinite recursivity?

    I can't find a good example that demonstrates that.

    expr ::= expr "+" expr;
    expr ::= '[0-9]+';

    K, using a top-down approach, parse:

    5+2

    As soon as you enter expr, you enter expr, which brings you to expr, which
    brings you to expr, after which you enter expr,

    And it only gets deeper.

    That can be removed by factoring*, which is an exercise I leave to the
    reader.
  • No.2 | | 889 bytes | |

    Quinn Tyler Jackson schrieb:

    >My question: why is that? In which case a LL(1) parser can enter
    >infinite recursivity?
    >>

    >I can't find a good example that demonstrates that.
    >

    expr ::= expr "+" expr;
    expr ::= '[0-9]+';

    K, using a top-down approach, parse:

    5+2

    As soon as you enter expr, you enter expr, which brings you to expr, which
    brings you to expr, after which you enter expr,

    This example applies to both LL and LR parsers. It should be clear that
    a parser generator should handle such cases appropriately.

    In the case of an LL parser, a recursive call can occur only after a
    token has been consumed. An LR parser also should not perform an epsilon
    move, if something else is possible as well.

    DoDi

  • No.3 | | 1177 bytes | |

    Sun, 2006-07-16 at 10:44 -0400, Quinn Tyler Jackson wrote:
    * Factoring gets you around the issue in a known and legitimate way, but
    produces parse trees with many intermediate nodes that must be traversed
    before you get to the node type of the actual expression. Something like:

    expr
    power
    mult
    div
    add
    number
    integer

    Where what you really only need is:

    expr
    integer

    I have found that trimming the junk intermediate nodes "sooner rather than
    later" makes for easier traversal during tree interpretation. YMMV.

    JJTree (comes with JavaCC) uses a notation for doing this called
    conditional node descriptors, e.g. the "#AdditiveExpression(>1)" below:

    void AdditiveExpression() #AdditiveExpression(>1):
    {}
    {
    MultiplicativeExpression() (( "+" | "-" ) MultiplicativeExpression())*
    }

    ensures that AdditiveExpression nodes will only be created if they have
    more than one child node.

    I wrote a fair bit of ugly AST trimming code - which I shamefacedly
    deleted after re-reading the documentation and discovering this
    feature :-)

    Yours,

    Tom
  • No.4 | | 616 bytes | |

    "DevInstinct" <martin_lapierre@videotron.cawrote:
    # Hi, first of all, I'm not an expert in the theory of computation.
    #
    # I've read about LL(1) parsers and I have seen that they do not support
    # left recursion, because it is said that it would lead to infinite
    # recursivity.

    LL(k) grammars make parsing decisions based on the first k symbols.

    For
    S ::= Sb | a
    FIRST(Sb,1) = FIRST(S,1) + FIRST(a,1) = {a}
    FIRST(a,1) = {a}

    So both alternatives have the same FIRST(1) sets, and you can't decide
    which alternative to use. It's not an LL(1) grammar.
  • No.5 | | 1678 bytes | |

    I think the real problem here can be appreciated with a slightly more
    complex example:

    S ::= Ab | Ac
    A ::= Aa | <epsilon>

    Here, the grammar cannot be parsed by a LL(k) parser for any finite k,
    as I can always construct a sentence in the language that needs more
    than k lookahead to make that decision on the first production from
    'S'.

    A sentence beginning with k a's needs to look at at least (k+1) symbols
    to make the decision on which rule to use to expand 'S'. And expanding
    'S' to one of the alternatives is the first step a top down parser has
    to do.

    The grammar above is just the regular expression a*(b|c). It can easily
    be re-written to make it friendly to LL parsing.

    An LR (or LALR) parser does bottom up parsing. The reduction to 'S'
    from one of the alternatives is that last step for it, and hence it can
    handle such a grammar just fine. (Try it with yacc).

    Hope this helps,
    Rahul

    SM Ryan wrote:
    "DevInstinct" <martin_lapierre@videotron.cawrote:
    # Hi, first of all, I'm not an expert in the theory of computation.
    #
    # I've read about LL(1) parsers and I have seen that they do not support
    # left recursion, because it is said that it would lead to infinite
    # recursivity.

    LL(k) grammars make parsing decisions based on the first k symbols.

    For
    S ::= Sb | a
    FIRST(Sb,1) = FIRST(S,1) + FIRST(a,1) = {a}
    FIRST(a,1) = {a}

    So both alternatives have the same FIRST(1) sets, and you can't decide
    which alternative to use. It's not an LL(1) grammar.
  • No.6 | | 2145 bytes | |

    "DevInstinct" <martin_lapierre@videotron.caasked:
    My question: why is that? In which case a LL(1) parser can enter
    infinite recursivity?

    I can't find a good example that demonstrates that.

    Quinn Tyler Jackson answered:
    expr ::= expr "+" expr;
    expr ::= '[0-9]+';

    K, using a top-down approach, parse:

    5+2

    As soon as you enter expr, you enter expr, which brings you to expr, which
    brings you to expr, after which you enter expr,

    Hans-Peter Diettrich <DrDiettrich1@aol.comchallenged:
    This example applies to both LL and LR parsers. It should be clear that
    a parser generator should handle such cases appropriately.

    In the case of an LL parser, a recursive call can occur only after a
    token has been consumed.

    An LL parser generator cannot handle the case correctly, as the FIRST
    of both alternatives is 0-9. Thus, while it would be nice if an LL
    parser would wait to consume a token before making a recursive call,
    the LL algorithm requires one to make the recursive call before
    consuming the token, because making the call is necessary to allow the
    called procedure to consume the token. It also isn't a solution
    simply not to make the recursive call, for if you don't make the
    recursive call, you simply have eliminated the recursive rule from the
    grammar. If you simply wait to make the recursive call until after
    the token has been consumed, you have changed the grammar in another
    way.

    The point is that in an LL parser, you need to determine by looking at
    the lookahead (FIRST sets in this case) how many calls must be made,
    because you must make exactly the right series of calls before
    consuming the token.

    However, Dodi's assertion that LR parsers should (will) properly
    handle this case is correct.

    Hope this helps,
    -Chris

    Chris Clark Internet : compres@world.std.com
    Compiler Resources, Inc. Web Site : http://world.std.com/~compres
    23 Bailey Rd voice : (508) 435-5016
    Berlin, MA 01503 USA fax : (978) 838-0263 (24 hours)

  • No.7 | | 2315 bytes | |

    Chris F Clark schrieb:

    An LL parser generator cannot handle the case correctly, as the FIRST
    of both alternatives is 0-9.

    What's "correct" handling of such a case? LR parser generators also use
    heuristics in ambiguous situations.

    Thus, while it would be nice if an LL parser would wait to consume a
    token before making a recursive call, the LL algorithm requires one
    to make the recursive call before consuming the token, because
    making the call is necessary to allow the called procedure to
    consume the token.

    But the procedure already *has* been called, so it IM is allowed to
    consume the token.

    It also isn't a solution simply not to make the recursive call, for
    if you don't make the recursive call, you simply have eliminated the
    recursive rule from the grammar. If you simply wait to make the
    recursive call until after the token has been consumed, you have
    changed the grammar in another way.

    I agree that parser generators tend to change the grammar, in order to
    resolve conflicts. The sample grammar contains no indication about the
    grouping of the operation, but what does that *mean*? Does it mean
    that the grammar is useless, invalid, or simply ambiguous? Provided
    that the grammar is considered to be ambiguous, a parser generator
    either can reject it, or do whatever it likes.

    Let me give a more simplified grammar:

    expr ::= expr;
    expr ::= '[0-9]+';

    Here an LL parser might enter infinite recursion *before* consuming a
    token, whereas an LR parser might recurse *after* consuming a token.

    The point is that in an LL parser, you need to determine by looking at
    the lookahead (FIRST sets in this case) how many calls must be made,
    because you must make exactly the right series of calls before
    consuming the token.

    However, Dodi's assertion that LR parsers should (will) properly
    handle this case is correct.

    It's not a matter of the parser, but instead of the parser generator. I
    see no reason why a parser generator should be able to create an LR
    parser for the original grammer, but not an LL parser. And, for
    completeness, a parser generator IM should reject the grammar anyhow.

    DoDi
  • No.8 | | 999 bytes | |

    "DevInstinct" == DevInstinct <martin_lapierre@videotron.cawrites:

    DevInstinctMy question: why is that? In which case a LL(1)
    DevInstinctparser can enter infinite recursivity?

    DevInstinctI can't find a good example that demonstrates that.

    Consider the grammar

    term ::= '[0-9]+';
    expr ::= expr '+' term |
    term;

    and the expression

    1 + 2 + 3 + 3 + 5

    Ideally, this should be parsed into an abstract syntax tree that looks
    something like:

    add(add(add(add(1, 2), 3), 4), 5)

    Here's the problem. When you're handling the 1, you don't know
    whether to recurse into the "expr '+' term" or terminate on the
    "term". You actually should recurse four times and then terminate,
    but you can't know that without looking way ahead to the end of the
    expression, which can be arbitrarily far away, and you're only allowed
    one token of lookahead in LL(1).

    Andru
  • No.9 | | 6282 bytes | |

    Preamble: excuse this note for being excessively harsh. Dodi normally
    posts correct information, and as such deserves to be held to a high
    standard, cognizant of the respect he has earned. However, he pushed
    a hot button of mine by suggesting that left-recursion was some
    hueristic thing. At the same time I made a mistake when originally
    writing this assuming that Quinn's grammar was unambiguous, which it
    is not. I read Quinn's grammar and assumed it was Andru's. It is a
    mistake I commonly make because there are hueristics that change
    Quinn's grammar into Andru's, and these hueristics are commonly
    applied by LR parser generators and camoflague the fact that Quinn's
    grammar is actually ambiguous.

    Quinn wrote (q>):

    qexpr ::= expr "+" expr;
    qexpr ::= '[0-9]+';

    Andru corrected (a>);

    aterm ::= '[0-9]+';
    aexpr ::= expr '+' term |
    aterm;

    I wrote (>):

    An LL parser generator cannot handle the case correctly, as the FIRST
    of both alternatives is 0-9.

    Dodi replied:
    What's "correct" handling of such a case? LR parser generators also use
    heuristics in ambiguous situations.

    Now, I mistook Quinn's ambiguous grammar for Andru's correct grammar.
    Thus, for Quinn's gramar you are correct. For Andru's grammar, the
    situation is different.

    Left recursion is not an inherently ambiguous situation. It is
    perfectly valid in LR parsing to use left recursion and it does not
    require heuristics to do so. The correct handling of non-ambiguous
    left recursive cases is perfectly well defined. Ambiguous grammars do
    require heuristics and considering them invalid is perfectly rational.

    However, even for unambiguous left-recursive grammars the LL algorithm
    fails to construct a parser, while the LR algorithm will properly
    construct a parser.

    Thus, while it would be nice if an LL parser would wait to consume a
    token before making a recursive call, the LL algorithm requires one
    to make the recursive call before consuming the token, because
    making the call is necessary to allow the called procedure to
    consume the token.

    But the procedure already *has* been called, so it IM is allowed to
    consume the token.

    Yes, but now it must make a recursive call. It must make the
    recursive call because, if it were the name of a different
    non-terminal it would have to call that non-terminal (and that might
    lead indirectly back to a recursive call on this non-terminal).

    It also isn't a solution simply not to make the recursive call, for
    if you don't make the recursive call, you simply have eliminated the
    recursive rule from the grammar. If you simply wait to make the
    recursive call until after the token has been consumed, you have
    changed the grammar in another way.

    I agree that parser generators tend to change the grammar, in order to
    resolve conflicts. The sample grammar contains no indication about the
    grouping of the operation, but what does that *mean*? Does it mean
    that the grammar is useless, invalid, or simply ambiguous? Provided
    that the grammar is considered to be ambiguous, a parser generator
    either can reject it, or do whatever it likes.

    The problem here is there is no conflict in Andru's LR case, because
    the LR algorithm correctly handles left-recursion. In Quinn's case,
    you are right, there is no indication of the grouping of the operation
    and the grammar is ambiguous. (So, if you are chastising me for
    seeing Andru's grammar while reading Quinn's, I stand perfectly
    chastised.)

    By the way, the definition of correct parse is also not technology
    specific. Every grammar is either ambiguous or has exactly one
    correct parse. Now, some parsing technologies cannot find the correct
    parse for some grammars includes some gramars which there are no
    LL or LR parsers for as well as ones which have LR parsers and not LL
    ones and grammars with LL(k) ones and not LR(1) ones, etc. However,
    that doesn't change the definition of correct parse.'s a rigorous,
    precise, mathematical concept.

    The point is that in an LL parser, you need to determine by looking at
    the lookahead (FIRST sets in this case) how many calls must be made,
    because you must make exactly the right series of calls before
    consuming the token.

    However, Dodi's assertion that LR parsers should (will) properly
    handle this case is correct.

    It's not a matter of the parser, but instead of the parser generator.

    , let me make that more clear. No parser generator executing the LL
    algorithm for constructing a parser, can construct a parser from a
    grammar with left-recursion. A parser generator running any LR
    (e.g. LR(0), SLR, LALR, canonical LR, minimal state LR, etc.)
    algorithm can construct parsers for left-recursive grammars, presuming
    that grammar has no conflicts (and Andru's grammar is conflict-free).

    There is a fundamental difference in the LL and LR parser generator
    algorithms which allow LR parsers to be constructed for left-recursive
    cases, while LL parsers are fundamentally limited from handling such
    cases.

    And that's the nit which is a hot button for me. It's fundamentally
    easier to write an LL parser generator because the algorithm is
    simpler, but handles fewer cases, not that writing an LR parser
    generator is fundamentally that hard. Moreover, recursive descent
    parsers are popular because their output is so "readable". Being able
    to use left-recursion (and the oft maligned precedence and
    associativity declarations) can make the grammar itself simpler and
    more readable. That makes them some of the reasons to prefer LR
    parsing. Therefore, I get overly defensive when those advantages are
    misunderstood or belittled.

    Hope this helps,
    -Chris

    Chris Clark Internet : URL
    Compiler Resources, Inc. Web Site : URL
    23 Bailey Rd voice : (508) 435-5016
    Berlin, MA 01503 USA fax : (978) 838-0263 (24 hours)
  • No.10 | | 3807 bytes | |

    Chris noted:

    Quinn's grammar into Andru's, and these hueristics are commonly
    applied by LR parser generators and camoflague the fact that Quinn's
    grammar is actually ambiguous.

    Whoops. Yes. I knew it was also ambiguous, but was only looking at the left
    recursive aspect of it. That's what I get for running off half-loaded with
    my example. A better example might have been:

    list ::= list CMMA item | item;
    item ::= NUMBER | STRING_LITERAL;

    some such. This could be expressed right recursively by changing list to:

    list ::= item CMMA list | item;

    The left recursive form might be the preferred form for a LR parser, since
    it could, in theory, parse a very long list without producing a horrendously
    deep stack in the process, whereas the right recursive form will tend to
    produce a deep stack. The left recursive form precludes parsing by LL
    methods, the right form is acceptable to LL (and still results in a very
    deep stack with very long lists). A typical work around in the LL world is
    extended BNF like this:

    list ::= (item CMMA)+ | item

    Sometimes written

    list ::= {item CMMA} | item

    This produces an iterative parser that doesn't build a deep stack, and is
    also not recursive. (There's nothing to stop an LR parser generator author
    from using those extensions to do the same thing, but since the left
    recursive form does the same thing, there's no perceived need for such
    extensions, perhaps.)

    Now, in a similar tone to "heuristics are commonly applied to camouflage
    the fact", Dodi said (in another message in this thread):

    It's not a matter of the parser, but instead of the parser generator. I
    see no reason why a parser generator should be able to create an LR
    parser for the original grammer, but not an LL parser. And, for
    completeness, a parser generator IM should reject the grammar anyhow.

    It's easy to catch (either direct or indirect) left recursion in an LL
    generator and then generate a build error. This, IM, is the preferred
    method.

    Although in some cases, it's also possible to generate a left-factored
    version of the rule and quietly generate an equivalent grammar that doesn't
    contain the left-recursion, I've always been against invisible fixes of what
    are fundamental grammar construction issues. (Such modifications can produce
    inexplicably deep parse trees that contain nodes that were manufactured by
    the generator to bypass the issue, for example, and I prefer nodes to come
    from explicitly stated rules rather than something the generator dreamed
    up.)

    Adaptive grammars that use an LL algorithm to effect a parse come up with
    another problem with left recursion. It may not be knowable until run time
    (in the middle of a parse) whether or not a grammar is left-recursive:

    x ::= (y|z) CLN list
    y ::= /* something that sets foo to non-lambda */
    z ::= /* something that sets foo to lambda */
    list ::= foo list CMMA item | item;

    x follows the z path, list is left-recursive at run time. Although it is
    technically possible to strictly detect this at grammar-compile time "in the
    simple case", one cannot do it "in the general case" because of Turing
    Power, and foolproof predetermination that foo cannot/can be empty implies a
    solution to the Halting problem. The adaptive(k) algorithm allows list but
    fails at run-time if foo is lambda. (It does this by using a sanity check on
    stack depth when list is entered; if foo is empty, that sanity check will
    fail when the stack quickly hits the upper limit).

    (Please note that I wrote this email in haste -- all blunders my own.)
  • No.11 | | 1884 bytes | |

    Chris F Clark schrieb:

    Preamble: excuse this note for being excessively harsh. Dodi normally
    posts correct information, and as such deserves to be held to a high
    standard, cognizant of the respect he has earned. However, he pushed
    a hot button of mine by suggesting that left-recursion was some
    hueristic thing

    Let me summarize my points (John, you may drop my long answer ;-)

    1. LL parsers cannot handle left recursion, whereas LR parsers cannot
    handle right recursion.

    2. Most languages are ambiguous at the syntax level, so that implicit
    disambiguation (longest match) or explicit semantical constraints
    must be introduced. (see: dangling else).

    3. LL(1) recursive descent parsers are readable, that's why no
    LL(k) parser generators exist, in contrast to LR(k) parser generators.

    4. When at least one terminal must be consumed, before a recursive
    invocation of a rule, no infinite recursion can occur. (every loop will
    terminate after all terminals in the input stream have been consumed)

    Any objections, so far?

    Ad (1+2): We should keep grammars apart from languages. Most languages
    require recursive grammars, but allow for both left or right recursive
    grammars.
    Languages with "expr '+' expr" or "list ',' list" can be parsed in
    multiple ways. Unless there exist additional (semantical) restrictions,
    correct and equivalent left or right recursive grammars can be
    constructed for such languages.
    And when a human is allowed to disambiguate a grammar for such a
    language himself, a parser generator should be allowed to do the same ;-)

    Ad (3): This is the only reason for the preference of LR parsers. Why
    spend time with tweaking a grammar to LR(1)/LL(1), when a parser
    generator for LR(k) is available?

    DoDi
  • No.12 | | 1687 bytes | |

    Chris F Clark <cfc@shell01.TheWorld.comwrites:

    It's not a matter of the parser, but instead of the parser generator.

    , let me make that more clear. No parser generator executing the LL
    algorithm for constructing a parser, can construct a parser from a
    grammar with left-recursion. A parser generator running any LR
    (e.g. LR(0), SLR, LALR, canonical LR, minimal state LR, etc.)
    algorithm can construct parsers for left-recursive grammars, presuming
    that grammar has no conflicts (and Andru's grammar is conflict-free).

    You seem to be conceding too much here. The essential difference is
    in the parsers, not just the parser generators. Recall (as I
    explained in )
    that an LR parser is a shift/reduce parser whereas an LL parser is a
    confirm/expand parser. Each also has a control that decides at each
    step which of the two possible actions to take (whether to shift or
    reduce in an LR parser, whether to confirm or expand in an LL parser).
    The parser generator only influences that control function, not what
    the two possible actions are. Yet the issue with left recursion stems
    directly from the available actions, not from the control. For a
    left-recursive grammar, no control -- no matter how it works or how it
    was generated -- can possibly decide between confirming and expanding
    given only the previous input and a bounded amount of lookahead into
    the future input. Deciding between shifting and reducing, on the
    other hand is possible.
    -Max Hailperin
    Professor of Computer Science
    Chair, Mathematics and Computer Science Department
    Gustavus Adolphus College
    URL
  • No.13 | | 642 bytes | |

    Hans-Peter Diettrich wrote:
    Let me summarize my points (John, you may drop my long answer ;-)

    1. LL parsers cannot handle left recursion, whereas LR parsers cannot
    handle right recursion.

    I'm not sure I understand clearly What do you mean by "LR parsers
    cannot handle right recursion"?

    There is no LALR(1) conflict in the following "right recursive" grammar
    (SableCC-like syntax):

    rht_rec_rule =
    {one} some token |
    {many} some token rht_rec_rule;

    As far as I remember, LR parsers can deal with both left and right
    recusrsive grammars. Maybe you meant something else?

    Etienne
  • No.14 | | 4104 bytes | |

    # 1. LL parsers cannot handle left recursion, whereas LR parsers cannot
    # handle right recursion.

    A left recursive grammar is not an LL(k) grammar, but the grammar can
    be mechanically transformed to rid it of left recursion. The resulting
    grammar might still not be LL(k).

    LR(k) can handle any deterministic grammar. Left recursive productions
    only need one production on the stack; right recursion piles up the
    entire recursive nest on the stack and then reduces all of them at
    once: right recursion requires a deeper stack.

    # 2. Most languages are ambiguous at the syntax level, so that implicit
    # disambiguation (longest match) or explicit semantical constraints
    # must be introduced. (see: dangling else).

    poorly designed programming languages are ambiguous. (Natural
    languages are ambiguous and don't use deterministic language theory.)

    Many programming language grammars are ambiguous because the people
    writing the grammars are lazy and/or uneducated in these matters. The
    dangling else nonproblem was solved some 40 years ago. Anyone who
    thinks this is still a problem is just too lazy to write a well known
    solution.

    # 3. LL(1) recursive descent parsers are readable, that's why no
    # LL(k) parser generators exist, in contrast to LR(k) parser generators.

    Recursive descent is not LL(k). Recursive descent is an implementation
    technique not a language class. There are recursive ascent parsers for
    LR(k) grammars. LR(k) parsers can be written as recursive finite state
    transducers, with right recursion and embedding requiring recursion
    and left recursion merely looping; if a language uses left recursion
    only (type iii), the LR(k) state diagram is easily convertible to a
    finite transducer for the language.

    # 4. When at least one terminal must be consumed, before a recursive
    # invocation of a rule, no infinite recursion can occur. (every loop will
    # terminate after all terminals in the input stream have been consumed)

    Confusing implementation with language class. Any grammar that
    includes a rule such as A -A | empty is ambiguous therefore
    nondeterministic therefore not LR(k) therefore not LL(k). There are
    many other things that keep a grammar or language from being LL(k);
    LL(k) does not include all deterministic languages.

    # Ad (1+2): We should keep grammars apart from languages. Most languages
    # require recursive grammars, but allow for both left or right recursive
    # grammars.

    A language definition that does not depend on vague handwaving (one or
    two such definitions actually exist) bases the semantics on the parse
    tree. Since right and left recursion build different parse trees, this
    issue is very important in definitions with riguous semantics.

    # Languages with "expr '+' expr" or "list ',' list" can be parsed in
    # multiple ways. Unless there exist additional (semantical) restrictions,
    # correct and equivalent left or right recursive grammars can be
    # constructed for such languages.

    just write an unambiguous production. It's not any harder to do it
    right than to do it lazy and wrong.

    If the semantics of a subtract production are the value of the right
    subtree is subtracted from the value of left subtree, then
    3 - 2 - 1
    with left recursion is
    = (3 - 2) - 1 = 1 - 1 = 0
    with right recursion is
    = 3 - (2 - 1) = 3 - 1 = 2

    Rather different answers but unless your software is controlling a
    satellite to Venus, I guess sloppiness can be repaired in the next
    patch release.

    Algol-60 used left recursion except exponentiation so that the value
    could be determined from the parse tree without a lot misreadable
    prose.

    # And when a human is allowed to disambiguate a grammar for such a
    # language himself, a parser generator should be allowed to do the same ;-)

    hy bother inserting ambiguity and then remove it again with obscure
    rules? Eschew ambiguity from the onset.
  • No.15 | | 4991 bytes | |

    Hans-Peter Diettrich <DrDiettrich1@aol.comwrites:

    1. LL parsers cannot handle left recursion, whereas LR parsers cannot
    handle right recursion.

    2. Most languages are ambiguous at the syntax level, so that implicit
    disambiguation (longest match) or explicit semantical constraints
    must be introduced. (see: dangling else).

    3. LL(1) recursive descent parsers are readable, that's why no
    LL(k) parser generators exist, in contrast to LR(k) parser generators.

    4. When at least one terminal must be consumed, before a recursive
    invocation of a rule, no infinite recursion can occur. (every loop will
    terminate after all terminals in the input stream have been consumed)

    Any objections, so far?

    Yes.

    Point 1 is incorrect; LR parsers can handle right recursion in
    addition to left recursion.

    Point 2 is incorrectly stated, at the least. You seem to have meant
    not that the languages are ambiguous, but rather than particular
    grammars that are commonly used for those languages are ambiguous.
    Taking the example of the dangling else problem, there are unambiguous
    grammars for this as well as ambiguous ones. The published reference
    grammar for Java, for example, is unambiguous, rather than relying
    upon an extragrammatical disambiguation rule. (Note that such a
    disambiguation rule is not semantic, contrary to what you say, just a
    portion of the syntax specification that may, in some cases, be
    presented extragrammatically.) Therefore, it seems that your point 2
    boils down to a claim that "most" of the time the specifiers of
    programming language syntaxes find it more convenient to use an
    ambiguous grammar coupled with extragrammatical disambiguation rules,
    rather thon to use an unambiguous grammar. This is an empirical
    question about the frequency with which different notational
    approaches are used, which I would be hesitant to try to answer just
    based on what I personally have seen. It certainly is not an
    implausible claim, however.

    Point 3 can be read two ways; both are incorrect. You may be claiming
    that recursive descent parsers are only are readable when based on
    LL(1) grammars, or you may be making the stronger claim that no parser
    other than an LL(1)-recursive-descent parser is readable, including no
    non-recursive-descent parser. By refuting the weaker claim, I can
    show that both are false. There are readable recursive descent
    parsers that are not based on LL(1) grammars, both ones that are based
    on LL(k) grammars for k>1, and ones that are not based on any LL(k)
    grammar, but rather require general unbounded backtracking. Taking
    the simpler case of k>1, consider, for example, the following grammar,
    which is LL(2) but not LL(1):

    S -a b S | a c

    The recursive descent parser for this can be written quite readably.
    As some evidence of that, I append to the end of this posting a Java
    program that embodies such a parser. (It isn't the world's greatest
    Java style, but I think it still makes the point.)

    Ad (3): This is the only reason for the preference of LR parsers. Why
    spend time with tweaking a grammar to LR(1)/LL(1), when a parser
    generator for LR(k) is available?

    No, the preference for LR parsers over LL has little to do with point
    3, not only because point 3 is incorrect, but also because you are
    here addressing the k>1 case, whereas most practical parsers use k=1
    in any case. The issue is that LR(1) parsers are more powerful than
    LL(1); the grammars suitable for LR(1) parsing are a proper superset
    of those suitable for LL(1) parsing.
    -Max Hailperin
    Professor of Computer Science
    Chair, Mathematics and Computer Science Department
    Gustavus Adolphus College
    URL

    a Java LL(2) recursive descent parser follows

    import java.io.BufferedReader;
    import java.io.FileReader;
    import java.io.IException;

    public class LL2 {
    // a simple LL(2) recursive descent parser
    // using individual bytes of input as tokens (no lexical analysis)

    private static boolean S(BufferedReader input) throws IException {
    input.mark(2); // prepare to put back up to 2 characters

    // try production 1
    if(input.read() == 'a' && input.read() == 'b')
    return S(input);
    else
    input.reset();

    // try production 2
    if(input.read() == 'a' && input.read() == 'c')
    return true;
    else
    input.reset();

    // no more productions; failure
    return false;
    }

    public static void main(String[] args){
    try{
    BufferedReader in = new BufferedReader(new FileReader(args[0]));
    if(S(in) && in.read() == -1)
    System.out.println("Input succesfully parsed.");
    else
    System.out.println("Input was not in the language.");
    } catch(Exception e) {
    System.err.println(e);
    }
    }
    }
  • No.16 | | 617 bytes | |

    SM Ryan <wyrmwif@tsoft.orgwrote:

    # 3. LL(1) recursive descent parsers are readable, that's why no
    # LL(k) parser generators exist, in contrast to LR(k) parser generators.

    What about ANTLR, or spirit in boost.? Both generate parsers
    ANTLR in the classical sense that it generates code to be compiled
    like yacc/bison does, and spirit is an embedded template based parser
    generator that lets the C++ compiler itself generate the grammar from
    the EBNF. There are other LL(k) for fixed k parser generators as
    well, but these are the ones I have some familiarity with.
  • No.17 | | 7710 bytes | |

    SM Ryan schrieb:

    # 1. LL parsers cannot handle left recursion, whereas LR parsers cannot
    # handle right recursion.

    A left recursive grammar is not an LL(k) grammar, but the grammar can
    be mechanically transformed to rid it of left recursion. The resulting
    grammar might still not be LL(k).

    LR(k) can handle any deterministic grammar. Left recursive productions
    only need one production on the stack; right recursion piles up the
    entire recursive nest on the stack and then reduces all of them at
    once: right recursion requires a deeper stack.

    Thanks for your explanations, but I'm still not fully convinced ;-)

    So far I only used CoCo/R to create recursive descent parsers, so that
    I had to handle all non-LL(1) cases manually. Perhaps hereby I have
    applied some modifications to the grammar, when I made work e.g. my C
    parser without problems with left recursive rules.

    That's why I thought that LR parsers have similar problems with right
    recursion (epsilon moves?), where a parser generator also would have
    to apply some built-in rules, in order to resolve these problems. But
    this only is a feeling, I'm not very familiar with LR parsers, because
    I couldn't yet find a working parser generator with Pascal output.

    At least I understand now, that right recursion should be *avoided* in
    LR grammars, in order to keep the stack size low.

    # 2. Most languages are ambiguous at the syntax level, so that implicit
    # disambiguation (longest match) or explicit semantical constraints
    # must be introduced. (see: dangling else).

    poorly designed programming languages are ambiguous. (Natural
    languages are ambiguous and don't use deterministic language theory.)

    Counterexample:
    The argument list of a subroutine is nothing but a list, which can be
    expressed using either left or right recursion in a grammar.

    Perhaps I misused the term "ambiguous" here, when I meant that different
    parse trees can be constructed for a sentence of a language?

    Many programming language grammars are ambiguous because the people
    writing the grammars are lazy and/or uneducated in these matters. The
    dangling else nonproblem was solved some 40 years ago. Anyone who
    thinks this is still a problem is just too lazy to write a well known
    solution.

    The dangling else problem can be solved by adding implicit general rules
    to the interpretation of a language (or grammar?). course there exist
    ways to prevent the occurence of such problems, just in the language
    specification. But AFAIR it's impossible to prove, in the general case,
    that a language is inambigous.

    # 3. LL(1) recursive descent parsers are readable, that's why no
    # LL(k) parser generators exist, in contrast to LR(k) parser generators.

    Recursive descent is not LL(k). Recursive descent is an implementation
    technique not a language class.

    , but what's the relationship between leftmost/rightmost derivation
    and a language?

    There are recursive ascent parsers for
    LR(k) grammars. LR(k) parsers can be written as recursive finite state
    transducers, with right recursion and embedding requiring recursion
    and left recursion merely looping; if a language uses left recursion
    only (type iii), the LR(k) state diagram is easily convertible to a
    finite transducer for the language.

    I'm not sure what you want to tell me. AFAIR LR(k) (languages?
    grammars?) can be transformed into LR(1), for which a finite state
    machine can be constructed easily. I assume that a transformation from
    LL(k) into LL(1) would be possible as well, using essentially the same
    transformation algorithms.

    My point is that table driven parsers are unreadable, due to the lost
    relationship between the parser code and the implemented language or
    grammar. Do there exist LR parsers or parser generators at all, which
    preserve the relationship between the parser code and the grammar?

    # 4. When at least one terminal must be consumed, before a recursive
    # invocation of a rule, no infinite recursion can occur. (every loop will
    # terminate after all terminals in the input stream have been consumed)

    Confusing implementation with language class. Any grammar that
    includes a rule such as A -A | empty is ambiguous therefore
    nondeterministic therefore not LR(k) therefore not LL(k).

    I wanted to present an proof, whether a given grammar is (non-?)
    deterministic, regardless of the reason for that property.

    There are many other things that keep a grammar or language from
    being LL(k); LL(k) does not include all deterministic languages.

    # Ad (1+2): We should keep grammars apart from languages. Most languages
    # require recursive grammars, but allow for both left or right recursive
    # grammars.

    A language definition that does not depend on vague handwaving (one or
    two such definitions actually exist) bases the semantics on the parse
    tree. Since right and left recursion build different parse trees, this
    issue is very important in definitions with riguous semantics.

    With regards to programming languages, there exist many constructs that
    do not impose or require the construction of an specific (unique) parse
    tree. As long as the parser must not know about the definition of an
    identifier, the placement of the definitions in an parse tree is
    irrelevant, when it doesn't change the semantics of the language
    (visibility constraints).

    # Languages with "expr '+' expr" or "list ',' list" can be parsed in
    # multiple ways. Unless there exist additional (semantical) restrictions,
    # correct and equivalent left or right recursive grammars can be
    # constructed for such languages.

    just write an unambiguous production. It's not any harder to do it
    right than to do it lazy and wrong.

    If the semantics of a subtract production are the value of the right
    subtree is subtracted from the value of left subtree, then
    3 - 2 - 1
    with left recursion is
    = (3 - 2) - 1 = 1 - 1 = 0
    with right recursion is
    = 3 - (2 - 1) = 3 - 1 = 2

    This is a property of the asymmetric subtraction operation, which
    doesn't apply to the symmetric addition or multiplication operations.
    course it's a good idea to enforce a unique sequence of *numerical*
    operations in program code, whereas in mathematical formulas such
    additional restrictions should *not* be built into a grammar.

    Rather different answers but unless your software is controlling a
    satellite to Venus, I guess sloppiness can be repaired in the next
    patch release.

    No doubt, but IM you want to introduce more restrictions than required.
    A compiler is allowed to apply certain *valid* transformations on an
    parse tree, in so far I cannot see a reason why any grammar or parser
    for that language must enforce the construction of one-and-only-one
    valid parse tree.

    Algol-60 used left recursion except exponentiation so that the value
    could be determined from the parse tree without a lot misreadable
    prose.

    # And when a human is allowed to disambiguate a grammar for such a
    # language himself, a parser generator should be allowed to do the same ;-)

    hy bother inserting ambiguity and then remove it again with obscure
    rules? Eschew ambiguity from the onset.

    Here you're talking about the construction of languages, not about the
    construction of parsers ;-)

    DoDi
  • No.18 | | 5310 bytes | |

    Max Hailperin schrieb:

    >1. LL parsers cannot handle left recursion, whereas LR parsers cannot
    >handle right recursion.
    >>

    >2. Most languages are ambiguous at the syntax level, so that implicit
    >disambiguation (longest match) or explicit semantical constraints
    >must be introduced. (see: dangling else).
    >>

    >3. LL(1) recursive descent parsers are readable, that's why no
    >LL(k) parser generators exist, in contrast to LR(k) parser generators.
    >>

    >4. When at least one terminal must be consumed, before a recursive
    >invocation of a rule, no infinite recursion can occur. (every loop will
    >terminate after all terminals in the input stream have been consumed)
    >>

    >Any objections, so far?
    >

    Yes.

    Point 1 is incorrect; LR parsers can handle right recursion in
    addition to left recursion.

    Shame on me :-(

    Point 2 is incorrectly stated, at the least. You seem to have meant
    not that the languages are ambiguous, but rather than particular
    grammars that are commonly used for those languages are ambiguous.

    As already stated in a parallel answer, a language can allow for
    multiple different parse trees of the same (valid) input, what has been
    considered as "ambigous" by several contributors. Can you suggest a
    better wording?

    Taking the example of the dangling else problem, there are unambiguous
    grammars for this as well as ambiguous ones. The published reference
    grammar for Java, for example, is unambiguous,

    Just looked it up, and the Java grammar does not only look horrible to
    me, it also can serve as an example of IM inappropriate techniques in
    the design of a language. I just don't know about the order of this
    procedure (exponential?), when it had to be applied to disambiguate more
    than one production :-(

    IM the maintenance of such a bloated grammar is at least very
    inconvenient and error prone.

    At least a much simpler Java grammar could have been constructed, by
    dropping certain compatibility with the C language syntax in the fist step.

    rather than relying
    upon an extragrammatical disambiguation rule. (Note that such a
    disambiguation rule is not semantic, contrary to what you say, just a
    portion of the syntax specification that may, in some cases, be
    presented extragrammatically.)

    I'm just thinking about the implications, built into any kind of grammar
    or grammar notation. Do they exist, beyond the syntax of the meta
    language? are they only introduced by specific parser generators, for
    the sake of shorter grammar notation?

    Therefore, it seems that your point 2
    boils down to a claim that "most" of the time the specifiers of
    programming language syntaxes find it more convenient to use an
    ambiguous grammar coupled with extragrammatical disambiguation rules,
    rather thon to use an unambiguous grammar. This is an empirical
    question about the frequency with which different notational
    approaches are used, which I would be hesitant to try to answer just
    based on what I personally have seen. It certainly is not an
    implausible claim, however.

    Just as we use high level programming languages, and let an compiler
    translate it into detailed assembly code, we prefer to write compact
    grammars, and leave it to an parser generator to produce the equivalent
    code or automaton. In both cases we should work on the highest (most
    compact) level of notation, in order to keep the sources maintainable.
    I'm not really happy with macros and similar extensions, when used to
    make source code only shorter, at the cost of transparency. TH I
    really dislike bloated grammars, with a lot of similar productions,
    which have been introduced for more or less obvious purposes, as in the
    beforementioned Java grammar.

    consider, for example, the following grammar,
    which is LL(2) but not LL(1):

    S -a b S | a c

    The recursive descent parser for this can be written quite readably.

    Provided that this grammar really is not LL(1), it can be transformed
    easily into LL(1) form. IM this example clearly demonstrates the
    advantages of using higher level notations, like EBNF:

    S -a ( b S | c )

    instead of:

    S -a S2
    S2 -b S | c


    >Ad (3): This is the only reason for the preference of LR parsers. Why
    >spend time with tweaking a grammar to LR(1)/LL(1), when a parser
    >generator for LR(k) is available?
    >

    No, the preference for LR parsers over LL has little to do with point
    3, not only because point 3 is incorrect, but also because you are
    here addressing the k>1 case, whereas most practical parsers use k=1
    in any case.

    Just a question: is your above grammar LR(1)?
    I'm somewhat clueless in answering this question myself :-(

    the grammars suitable for LR(1) parsing are a proper superset
    of those suitable for LL(1) parsing.

    Thanks for this enlightenment :-)

    DoDi
  • No.19 | | 1137 bytes | |

    25 Jul 2006 00:40:16 -0400, Hans-Peter Diettrich wrote:

    SM Ryan schrieb:
    >
    >If the semantics of a subtract production are the value of the right
    >subtree is subtracted from the value of left subtree, then
    >3 - 2 - 1
    >with left recursion is
    >= (3 - 2) - 1 = 1 - 1 = 0
    >with right recursion is
    >= 3 - (2 - 1) = 3 - 1 = 2
    >

    This is a property of the asymmetric subtraction operation, which
    doesn't apply to the symmetric addition or multiplication operations.
    course it's a good idea to enforce a unique sequence of *numerical*
    operations in program code, whereas in mathematical formulas such
    additional restrictions should *not* be built into a grammar.

    Do you mean that association should be handled after parsing? That would be
    a strange language. However, if you merely argue that 3 - 2 - 1 should be
    parsed as:

    " -"
    /|\
    3 2 1

    or (maybe better) as

    "+"
    / | \
    / "-""-"
    / | \
    3 2 1

    then I would certainly agree with you. But this is also "built in grammar"
    to me.
  • No.20 | | 1165 bytes | |

    Carl Barron schrieb:

    ># 3. LL(1) recursive descent parsers are readable, that's why no
    ># LL(k) parser generators exist, in contrast to LR(k) parser generators.
    >

    What about ANTLR, or spirit in boost.? Both generate parsers
    ANTLR in the classical sense that it generates code to be compiled
    like yacc/bison does, and spirit is an embedded template based parser
    generator that lets the C++ compiler itself generate the grammar from
    the EBNF. There are other LL(k) for fixed k parser generators as
    well, but these are the ones I have some familiarity with.

    Unfortunately none of these generate Pascal code. If I were happy with
    C/C++, I could use a broad range of tools immediatel. But since I'm not
    happy with these languages, even not with C# or Java, I decided to write
    an CtoPascal translator first. Having finished that tool, I'll come back
    to those parser generators ;-)

    BTW, a template based parser generator for C++ is a very interesting
    approach :-)

    DoDi
    [I see you've never tried typing pascal yacc into Google. John]
  • No.21 | | 6721 bytes | |

    Tue, 25 Jul 2006, Hans-Peter Diettrich wrote:
    SM Ryan schrieb:
    [Hans-Peter wrote:]
    ># 2. Most languages are ambiguous at the syntax level, so that implicit
    ># disambiguation (longest match) or explicit semantical constraints
    ># must be introduced. (see: dangling else).
    >>

    >poorly designed programming languages are ambiguous. (Natural
    >languages are ambiguous and don't use deterministic language theory.)
    >

    Counterexample:
    The argument list of a subroutine is nothing but a list, which can be
    expressed using either left or right recursion in a grammar.

    That's a true statement, but I don't see what it's a "counterexample"
    to.

    Perhaps I misused the term "ambiguous" here, when I meant that different
    parse trees can be constructed for a sentence of a language?

    Well, that's obvious. I mean, look at the following sentence:
    "The dog chases the cat." Now, we can parse that into the tree

    chases dog
    / \ or we can make the tree / \
    dog cat The the
    / / / \
    The the chases cat

    , one of those parse trees is more "natural", and, viewed
    through human eyes, expresses something useful about the structure
    of the sentence (namely, that a sentence has a verb, and nouns are
    more important than articles, and so on).
    However, we could probably write one grammar to make the parse tree
    on the left, and another grammar to make the parse tree on the right.
    Does this mean that something is "wrong" with English, or that the
    sentence "The dog chases the cat" admits of "more than one possible
    parse tree"? No, not in any useful sense.

    In the same way, it's possible to create many different "parse trees"
    for a C-language "sentence", indicated here with indentation levels:

    if (x) if (x) if (x)
    if (y) if (y) if (y)
    foo(); foo(); foo();
    else else else
    bar(); bar(); bar();

    the leftmost parse tree is useful in understanding the semantics
    of the C program; therefore, we call it "correct", and design our
    grammars to create this parse tree in preference to the other two.
    The middle parse tree demonstrates the "dangling else" non-problem.
    It is a /wrong/ parse tree. No correct grammar for C will generate it.
    The rightmost parse tree demonstrates the "can't distinguish between
    'if' and 'foo'" non-problem. It is also a /wrong/ parse tree; wrong
    enough that it is instantly recognizable as nonsensical to human eyes.
    Again, no correct grammar for C will generate it.

    There is absolutely no qualitative difference between the wrongness
    of the middle parse tree and the wrongness of the rightmost parse tree.

    In the same way, C has this kind of false "ambiguity" in its lexer.
    The string "" parses as ++, then +, even though a human might think
    it could parse as +, then But any grammar that would let "" parse
    that way would be a /wrong/ grammar, and wouldn't qualify as a grammar
    for C.


    >Many programming language grammars are ambiguous because the people
    >writing the grammars are lazy and/or uneducated in these matters. The
    >dangling else nonproblem was solved some 40 years ago. Anyone who
    >thinks this is still a problem is just too lazy to write a well known
    >solution.
    >

    The dangling else problem can be solved by adding implicit general rules
    to the interpretation of a language (or grammar?). course there exist
    ways to prevent the occurence of such problems, just in the language
    specification. But AFAIR it's impossible to prove, in the general case,
    that a language is inambigous.

    The sentence above is ambiguous. :) If you mean "there's no general
    algorithm that will take a grammar G and decide whether it is ambiguous",
    I think you're right, just because that seems like it ought to be an
    NP-complete problem.
    However, if you mean "nobody has ever proven that any grammar is
    unambiguous", you're wrong. It's easy to prove that /some/ specific
    grammars are unambiguous. I'm sure all the major programming languages'
    reference grammars have been vetted for correctness by now, for example.

    Those are my main points. Following are nitpicks.

    My point is that table driven parsers are unreadable, due to the lost
    relationship between the parser code and the implemented language or
    grammar. Do there exist LR parsers or parser generators at all, which
    preserve the relationship between the parser code and the grammar?

    Again, you seem to confuse human-centered subjective terms like
    "preserve the relationship" with measurable terms. there is
    /some/ kind of special relationship between a yacc-generated parser
    and its target language something that makes it parse C and not
    Java or pig Latin. You're complaining that the relationship isn't
    close enough to the surface for humans to see but why is that
    important?
    After all, the point of table-driven parsers is that humans never
    /need/ to see the generated code. It "just works."

    [SM Ryan wrote:]
    >A language definition that does not depend on vague handwaving (one or
    >two such definitions actually exist) bases the semantics on the parse
    >tree. Since right and left recursion build different parse trees, this
    >issue is very important in definitions with riguous semantics.


    I see where you're coming from, but that's slightly too dogmatic IMH
    What would you say to a language definition that defined a comma-separated
    list as "a list of N expressions, separated by commas" (or some such)?
    That's unambiguous, but doesn't imply left- or right-recursion, or in
    fact any recursion at all.

    []
    No doubt, but IM you want to introduce more restrictions than required.
    A compiler is allowed to apply certain *valid* transformations on an
    parse tree, in so far I cannot see a reason why any grammar or parser
    for that language must enforce the construction of one-and-only-one
    valid parse tree.

    Certainly any language must ensure that all interpretations of a given
    sentence /mean the same thing/. The easiest way to do that is to give
    an unambiguous grammar, so that there's only one interpretation of each
    sentence in the language. You could do weirder things, but AFAIK pretty
    much nobody's ever tried, because the easy way is so much easier. :)
    -Arthur
  • No.22 | | 3664 bytes | |

    # poorly designed programming languages are ambiguous. (Natural
    # languages are ambiguous and don't use deterministic language theory.)
    #
    # Counterexample:
    # The argument list of a subroutine is nothing but a list, which can be
    # expressed using either left or right recursion in a grammar.

    This is not about syntax ambuigity, but operator associativity. The
    language definition can define
    a@b@c = (a@b)@c = a@(b@c)
    for some binary operator @. The syntax should still be unambiguous,
    but the language lets the compiler reorder operations.

    It's important to keep associativity distinct from syntax. Some
    operators are associative, some are ant-associative, and some are
    not associative in any form. For example integer adds are usually
    associative, but real adds are not. The precise order the operands
    are summed can be critical in the stability of real arithmetic.

    # The dangling else problem can be solved by adding implicit general rules
    # to the interpretation of a language (or grammar?). course there exist
    # ways to prevent the occurence of such problems, just in the language
    # specification. But AFAIR it's impossible to prove, in the general case,
    # that a language is inambigous.

    LR(1) parser generation can only succeed for an unambiguous grammar,
    in which case the language is proven unambiguous. If you avoid bison
    shortcuts and resolve all reduce conflicts, you will have an unambiguous
    LALR grammar and your language is unambiguous.

    The dangling else was solved shortly after it was discovered by defining
    open and closed statements. The solution is simple.

    # Recursive descent is not LL(k). Recursive descent is an implementation
    # technique not a language class.
    #
    # , but what's the relationship between leftmost/rightmost derivation
    # and a language?

    None. It's the order in which you replace nonterminals in string
    derivation. You end up with the same parse tree for the same grammar
    in either case.

    # I'm not sure what you want to tell me. AFAIR LR(k) (languages?
    # grammars?) can be transformed into LR(1), for which a finite state
    # machine can be constructed easily. I assume that a transformation from
    # LL(k) into LL(1) would be possible as well, using essentially the same
    # transformation algorithms.

    No. LL(k+1) includes languages which cannot be LL(k). Also I don't think
    there's mechanical process to convert LR(k) into LR(1), merely that it
    is possible.

    # My point is that table driven parsers are unreadable, due to the lost
    # relationship between the parser code and the implemented language or
    # grammar. Do there exist LR parsers or parser generators at all, which
    # preserve the relationship between the parser code and the grammar?

    Any parser can be considerred table driven with the CPU interpretting
    object code. I don't concern myself with the code the parser generator
    creates. I concern myself with the grammar representation I give the
    parser generator.

    # With regards to programming languages, there exist many constructs that
    # do not impose or require the construction of an specific (unique) parse

    And many that do. The language definition can declare some operators
    commute or associate and some do not; that if a program's evaluation
    changes when converted into what the language definition says is
    equivalent, then it is the program that is at fault. Thus the
    unambiguous parse trees can be transformed in a number ways by the
    compiler.
  • No.23 | | 6107 bytes | |

    SM Ryan schrieb:
    poorly designed programming languages are ambiguous. (Natural
    languages are ambiguous and don't use deterministic language theory.)

    Hans-Peter Diettrich <DrDiettrich1@aol.com(Dodi) writes:
    Counterexample:
    The argument list of a subroutine is nothing but a list, which can be
    expressed using either left or right recursion in a grammar.

    with iteration (e.g. one of the closure operators, such as + and
    *). I think Dodi's point here is well-taken. The notation in a
    grammar should make the problem being solved clearer and not introduce
    any extraneous details. If there is no specific associativity in the
    language, I would prefer not to introduce artificial associativity
    into the grammar. (It's pretty ovbvious that SM Ryan disagrees in
    that he doesn't think a grammar shoulod be ambiguous, and he makes
    that point several times in his posting. Whether he would consider
    the rest of what I said implying accepting ambiguity, I will have to
    wait for his judgement.)

    To my mind, one specifically choses to write:

    expr: expr "+" term
    | term;

    or:

    right "+", "-";
    right "*"; "/";
    left "**";
    expr: expr "+" expr;

    or:

    expr: term ("+" term)*

    depending on what one intends to say.

    I would write explicit precedence if I had a specific parsing problem
    and I wanted the fact that I was expressing that problem explicitly in
    the grammar to emphasize the fact.

    I would write ambiguous rules with precedence and associativity
    declarations if I wanted to emphasize the fact that the items were
    simply expressions and that the precendence and associativity were
    associated with the operators.

    I would write regular expressions if I wanted to emphasize the fact
    that the item corresponds to a "list" and the operators have no
    precedence at all are are merely separators between the items of
    interest. In fact, we have a pair of binary closure operators &* and
    &+ that we specifically support in Yacc++ to make that even more
    clear.

    expr: term &* "+"; // same as above, but more terse

    Dodi again:
    Perhaps I misused the term "ambiguous" here, when I meant that different
    parse trees can be constructed for a sentence of a language?

    That is the definition of ambiguous. An ambiguous graqmmar admits to
    or more parses (i.e. parse trees) for a given sentence. An unambiguous
    grammar admits only one parse.

    SM Ryan again:
    Many programming language grammars are ambiguous because the people
    writing the grammars are lazy and/or uneducated in these matters. The
    dangling else nonproblem was solved some 40 years ago. Anyone who
    thinks this is still a problem is just too lazy to write a well known
    solution.

    However, I don't believe there is an LL grammar (in the strict
    technical sense) that solves the dangling else problem. can only
    solve the dangling else problem with a "hack" to the LL solution.
    Moreover, if one wants to do that, the "best" grammar to give to the
    LL generator (that supports the hack) is the ambiguous one.

    Note, that I have nothing against that "hack". It is hack in a good
    sense of the word (i.e. an elegant solution that takes a non-obvious
    short-cut that correctly handles the cases of interest).

    Dodi again:
    I'm not sure what you want to tell me. AFAIR LR(k) (languages?
    grammars?) can be transformed into LR(1), for which a finite state
    machine can be constructed easily. I assume that a transformation from
    LL(k) into LL(1) would be possible as well, using essentially the same
    transformation algorithms.

    Unfortunately not true. Moreover, I don't believe there is an
    algorithm for transforming an LR(k) grammar into an LR(1) grammar,
    just a (non-constructive) proof that such a grammar exists. Thus, if
    you have an LR(k) grammar and only an LR(1) tool, you are also
    out-of-luck.

    My point is that table driven parsers are unreadable, due to the lost
    relationship between the parser code and the implemented language or
    grammar. Do there exist LR parsers or parser generators at all, which
    preserve the relationship between the parser code and the grammar?

    Table driven parsers (at least LR ones) are more unreadable than
    recursive descent ones in the same sense that the drawing (or a table
    describing the drawing) of an DFA is less-readable than the
    corresponding regular expression. I think there is some cognitive
    effects that allow a person's mind to ignore the parallelism in a
    linear presentation (e.g. a recursive descent routine or a regular
    expression) and to focus on the depth-first properities which are
    brought to the surface in such representations. People do not seem to
    think as well in breath-first representations. I think this has to do
    with some anthropomorphising where one inserts oneself into the
    current location in the code as it progresses through one case. This
    works the same way people often single-step through code in a
    debugger.

    That being said, one can learn to deal quite effectively with the DFAs
    of an LR machine. Moreover, I have find that the single-step approach
    often blinds one to things happening in other paths that one "should"
    be considering. In fact, one of my biggest complaints about
    hand-generated recursive descent is that it lends itself to ad-hack
    approaches (hack used pejoratively here), where one twiddles with
    local parts of the parser to get it to accept the case one is
    interested in, without considering whether one has a consistent
    grammar (i.e. that the same code works in another context).

    Hope this doesn't confuse the issues any more,
    -Chris

    Chris Clark Internet : compres@world.std.com
    Compiler Resources, Inc. Web Site : http://world.std.com/~compres
    23 Bailey Rd voice : (508) 435-5016
    Berlin, MA 01503 USA fax : (978) 838-0263 (24 hours)

  • No.24 | | 1011 bytes | |

    Max Hailperin <max@gustavus.eduwrites:

    You seem to be conceding too much here. The essential difference is
    in the parsers, not just the parser generators. Recall (as I
    explained in )
    that an LR parser is a shift/reduce parser whereas an LL parser is a
    confirm/expand parser.

    Point well-taken and well-made. Moreover, anyone who wants to
    understand LL and LR parsing would be well advised to read Max's
    article that he has kindly pointed us to. To my mind, it was a clear
    and lucid explanation of the principles behind how the parsing
    algorithms work that can help one have an informed intuition.

    <In fact, John, if it isn't referenced in the FAQ, I would recomend
    adding a pointer to it in the FAQ.>
    -Chris

    Chris Clark Internet : compres@world.std.com
    Compiler Resources, Inc. Web Site : http://world.std.com/~compres
    23 Bailey Rd voice : (508) 435-5016
    Berlin, MA 01503 USA fax : (978) 838-0263 (24 hours)

  • No.25 | | 2800 bytes | |

    Hans-Peter (Dodi) wrote:
    My point is that table driven parsers are unreadable, due to the lost
    relationship between the parser code and the implemented language or
    grammar. Do there exist LR parsers or parser generators at all, which
    preserve the relationship between the parser code and the grammar?

    "Arthur J. 'Dwyer" <ajonospam@andrew.cmu.eduwrites:
    Again, you seem to confuse human-centered subjective terms like
    "preserve the relationship" with measurable terms. there is
    /some/ kind of special relationship between a yacc-generated parser
    and its target language something that makes it parse C and not
    Java or pig Latin. You're complaining that the relationship isn't
    close enough to the surface for humans to see but why is that
    important?
    After all, the point of table-driven parsers is that humans never
    /need/ to see the generated code. It "just works."

    Arthur, if that were only true!!! When it is true, life is wonderful.
    When it isn't true, because your parser generator has given you a
    conflict message and one needs to understand why your grammar is
    incorrect and what you should do about it, life is painful. In that
    case, one really does need to be able to read the tables and
    understand them. For many it seems, that task is akin to visiting the
    dentist. For that reason alone, I don't think recursive descent will
    ever die (and nor should it, at least not in the machine generated
    version).

    Note, this in fact gives me mixed feelings (because I've worked hard
    to make Yacc++ accept a wide class of grammars, wider than traditional
    LALR(1)). In general, the more powerful the parsing method, the more
    difficult it is to fix problems when the parser is misbehaving. (I'm
    reminded of a quote where Mr. Scott (of Star Trek) is standing with 4
    ball-bearings in hand looking at an incapacitated starship with
    trans-warp drive, saying roughly, "the more complicated the plumbing,
    the easier it is to break".) My feeling is that when GLR parsers
    break, they are harder to fix than simple LR ones, just as LR ones are
    harder to fix than LL ones. Moreover, I'm more certain that it is
    harder to "know" that a GLR parser is broken than an LR one, atleast
    if by broken one means unanticipated ambiguity, because the whole
    point of GLR parsing is to allow certain ambiguities, and I think that
    it would be hard to prove that one has only allowed the specific
    allowed ambiguities.
    -Chris

    Chris Clark Internet : compres@world.std.com
    Compiler Resources, Inc. Web Site : http://world.std.com/~compres
    23 Bailey Rd voice : (508) 435-5016
    Berlin, MA 01503 USA fax : (978) 838-0263 (24 hours)

  • No.26 | | 906 bytes | |

    Hans-Peter Diettrich <DrDiettrich1@aol.comwrites:

    >Point 2 is incorrectly stated, at the least. You seem to have meant
    >not that the languages are ambiguous, but rather than particular
    >grammars that are commonly used for those languages are ambiguous.
    >

    As already stated in a parallel answer, a language can allow for
    multiple different parse trees of the same (valid) input, what has been
    considered as "ambigous" by several contributors. Can you suggest a
    better wording?

    A language in the technical sense is just a set of strings. There is
    no notion of a parse tree until you provide a grammar. However, a
    language can be "ambiguous" in the sense that there are no
    non-ambiguous grammars for it (while there are ambiguous ones). Such
    languages are called "inherently ambiguous".

    Matthias

  • No.27 | | 767 bytes | |

    Dmitry A. Kazakov schrieb:

    Do you mean that association should be handled after parsing?

    If there exist no such restrictions, many things can be done after
    parsing. All optimizations, which a compiler for a certain language is
    allowed to apply, obviously must be made after parsing. Such
    transformations must respect the semantics of the language, but not
    their syntax, which has gone away after parsing.

    []
    But this is also "built in grammar" to me.

    When a strict formal description is given for a language, e.g. in form
    of a grammar, these details of course must be reflected in any grammar
    for that language. a grammar only must conform to the informal
    description of the language.

    DoDi

  • No.28 | | 440 bytes | |

    Hans-Peter Diettrich schrieb:

    [I see you've never tried typing pascal yacc into Google. John]

    A few months ago none of the free tools was in a really usable state. I
    just looked again, and found none but the well known tply, dyacclex and
    dcg, whose development has been abandoned many years ago.

    Did I miss something?

    DoDi
    [I see something called pyacc that comes with Debian. -John]
  • No.29 | | 3046 bytes | |

    Arthur J. 'Dwyer schrieb:

    # 2. Most languages are ambiguous at the syntax level, so that implicit
    # disambiguation (longest match) or explicit semantical constraints
    # must be introduced. (see: dangling else).

    poorly designed programming languages are ambiguous. (Natural
    languages are ambiguous and don't use deterministic language theory.)
    >Counterexample:
    >The argument list of a subroutine is nothing but a list, which can be
    >expressed using either left or right recursion in a grammar.
    >

    That's a true statement, but I don't see what it's a "counterexample"
    to.

    With regards to "poorly designed programming languages".

    After all I agree, that a language with a stricter definition leaves
    less room for different opinions about the implementation. ?

    In the same way, it's possible to create many different "parse trees"
    for a C-language "sentence", indicated here with indentation levels:

    if (x) if (x) if (x)
    if (y) if (y) if (y)
    foo(); foo(); foo();
    else else else
    bar(); bar(); bar();

    the leftmost parse tree is useful in understanding the semantics
    of the C program; therefore, we call it "correct", and design our
    grammars to create this parse tree in preference to the other two.

    How will you determine, in this special case or in general, which
    grammars or parse trees are "correct"?

    Those are my main points. Following are nitpicks.
    >
    >My point is that table driven parsers are unreadable, due to the lost
    >relationship between the parser code and the implemented language or
    >grammar. Do there exist LR parsers or parser generators at all, which
    >preserve the relationship between the parser code and the grammar?
    >

    Again, you seem to confuse human-centered subjective terms like
    "preserve the relationship" with measurable terms. there is
    /some/ kind of special relationship between a yacc-generated parser
    and its target language something that makes it parse C and not
    Java or pig Latin. You're complaining that the relationship isn't
    close enough to the surface for humans to see but why is that
    important?

    Consider that typically you want to add actions to a grammar. Then you
    must know about the context (stack contents), in which your code will
    execute.

    Finally you may want to debug your grammar and your code. In case of an
    recursive descent parser, the call stack contains much information about
    the parser state, whereas debugging the state transitions of an
    automaton requires additional debugging features.

    After all, the point of table-driven parsers is that humans never
    /need/ to see the generated code. It "just works."

    I found some yacc clones, which all just do *not* work. Now tell me how
    somebody should figure out, what makes these tools misbehave :-(

    DoDi

Re: Why LL(1) Parsers do not support left recursion?


max 4000 letters.
Your nickname that display:
In order to stop the spam: 1 + 0 =
QUESTION ON "Compilers"

EMSDN.COM