Perl

NAVIGATION
CATEGORIES
REFERRENCE
LINKS
  • Named captures.

    1 answers - 3709 bytes - related search similar search Add To My Delicious Add To My Stumble Upon Add To My Google Mark Add To My Facebook Add To My Digg Add To My Reddit

    2/9/07, Abigail <abigail (AT) abigail (DOT) bewrote:
    If a regular expression contains two named captures with the same
    name, $+ {NAME} returns the leftmost *defined* capture. I'm not
    sure how useful that is - I think I'd prefer the lefmost capture,
    whether defined or not.
    Consider the following code:
    my $re = qr /(?<integer>\d+)(?:\.(?<fraction>\d+))?/;
    if ("1.2 3.4" =~ /$re $re/) {
    print $+ {integer}, " ", $+ {fraction} // "UNDEF", "\n";
    }
    This prints "1 2", as expected. But if we change it to:
    my $re = qr /(?<integer>\d+)(?:\.(?<fraction>\d+))?/;
    if ("1 3.4" =~ /$re $re/) {
    print $+ {integer}, " ", $+ {fraction} // "UNDEF", "\n";
    }
    it prints "1 4", getting something from the first $re and something from
    the second. I would have expected "1 UNDEF".
    Just to repeat what I said on IRC today:
    For this you can use %-, which will given an array of the results of
    each capture buffer of a given name. So the above example could be
    written:
    my $re = qr /(?<integer>\d+)(?:\.(?<fraction>\d+))?/;
    if ("1 3.4" =~ /$re $re/) {
    foreach my $set (01) {
    print $- {integer}[$set], " ", $- {fraction}[$set] // "UNDEF", "\n";
    }
    }
    You'd get the results from each pair correctly ordered.
    Although this reminds me,
    STMT for keys %-;
    STMT for keys %+;
    Both warn about being ambiguous, which is annoying.
    D:\dev\perl\ver\zoro\win32>perl -e"$x++ for keys %+"
    Warning: Use of "keys" without parentheses is ambiguous at -e line 1.
    D:\dev\perl\ver\zoro\win32>perl -e"$x++ for keys %-"
    Warning: Use of "keys" without parentheses is ambiguous at -e line 1.
    D:\dev\perl\ver\zoro\win32>perl -e"$x++ for keys %x"
    Now, my guess is that returning the leftmost defined capture is useful
    in cases like:
    /(?<foo>PAT1)bar|(?<foo>PAT2)baz/
    But you could make use of (?|) then:
    /(?|(?<foo>PAT1)bar|(?<foo>PAT2))/
    That is, if you have the same NAME repeated, return the leftmost capture
    regardless whether defined or not, and use (?| ) if you want the leftmost
    defined one.
    Yes, i guess so.
    Although I think that given that we have %- this just becomes TMTWTDI.
    Alternatively, leave it as is, and have a way of getting to all the named
    captures, if if they share the same name. For instance after:
    my $re = qr /(?<integer>\d+)(?:\.(?<fraction>\d+))?/;
    if ("1 3.4" =~ /$re $re/) {
    print $+ {integer}, " ", $+ {fraction} // "UNDEF", "\n";
    }
    $+ {"integer"} eq '1' # Leftmost defined.
    $+ {"fraction"} eq '4' # Leftmost defined.
    $+ {"integer.1"} eq '1' # First capture of 'integer'
    $+ {"fraction.1"} eq undef # First capture of 'fraction'
    $+ {"integer.2"} eq '3' # Second capture of 'integer'
    $+ {"fraction.2"} eq '4' # Second capture of 'fraction'
    Yep, thats pretty much what you can do but the syntax on the last four would be
    $-{$name}[$idx]
    Another issue, the NAME of named captures have similar constraints on
    the name as identifiers - except that you cannot use '::' inside them.
    That's a pity because with '::' there would be an obvious way of using
    name spaces in your NAMEs.
    I'm open to modifications of the naming rule so long as it ensures
    that names are easy to parse and can't be confused with integers
    (possibly signed) so that \g{} stays unambiguous.
    Cheers,
    Yves
  • No.1 | | 1732 bytes | |

    09/02/07, demerphq <demerphq (AT) gmail (DOT) comwrote:
    Although this reminds me,

    STMT for keys %-;
    STMT for keys %+;

    Both warn about being ambiguous, which is annoying.

    D:\dev\perl\ver\zoro\win32>perl -e"$x++ for keys %+"
    Warning: Use of "keys" without parentheses is ambiguous at -e line 1.

    The simplest fix for that is to tell scan_ident to avoid making this
    check for hashes. It doesn't to it for $xxx, &xxx and @xxx
    identifiers. Not sure why. Probably because there weren't many
    punctuation hashes, and because % is also a binary operator. (but
    again, & is, too.)

    Untested, and I've not really thought about it:

    /tmp/tmp.81624.0 Fri Feb 9 13:00:39 2007
    /home/rafael/p4blead/toke.c Fri Feb 9 12:57:02 2007
    @@ -4132,7 +4132,7 @@ Perl_yylex(pTHX)
    Mop(P_MDUL);
    }
    PL_tokenbuf[0] = '%';
    - s = scan_ident(s, PL_bufend, PL_tokenbuf + 1, sizeof
    PL_tokenbuf - 1, TRUE);
    + s = scan_ident(s, PL_bufend, PL_tokenbuf + 1, sizeof
    PL_tokenbuf - 1, FALSE);
    if (!PL_tokenbuf[1]) {
    PREREF('%');
    }

    Another issue, the NAME of named captures have similar constraints on
    the name as identifiers - except that you cannot use '::' inside them.
    That's a pity because with '::' there would be an obvious way of using
    name spaces in your NAMEs.

    I'm open to modifications of the naming rule so long as it ensures
    that names are easy to parse and can't be confused with integers
    (possibly signed) so that \g{} stays unambiguous.

    So adding :: would be fine ? (or maybe a single char separator. But
    we've already got _.)

Re: Named captures.


max 4000 letters.
Your nickname that display:
In order to stop the spam: 3 + 2 =
QUESTION ON "Perl"

EMSDN.COM