Perl

NAVIGATION
CATEGORIES
REFERRENCE
LINKS
  • Named captures.

    3 answers - 2401 bytes - related search similar search Add To My Delicious Add To My Stumble Upon Add To My Google Mark Add To My Facebook Add To My Digg Add To My Reddit

    If a regular expression contains two named captures with the same
    name, $+ {NAME} returns the leftmost *defined* capture. I'm not
    sure how useful that is - I think I'd prefer the lefmost capture,
    whether defined or not.
    Consider the following code:
    my $re = qr /(?<integer>\d+)(?:\.(?<fraction>\d+))?/;
    if ("1.2 3.4" =~ /$re $re/) {
    print $+ {integer}, " ", $+ {fraction} // "UNDEF", "\n";
    }
    This prints "1 2", as expected. But if we change it to:
    my $re = qr /(?<integer>\d+)(?:\.(?<fraction>\d+))?/;
    if ("1 3.4" =~ /$re $re/) {
    print $+ {integer}, " ", $+ {fraction} // "UNDEF", "\n";
    }
    it prints "1 4", getting something from the first $re and something from
    the second. I would have expected "1 UNDEF".
    Now, my guess is that returning the leftmost defined capture is useful
    in cases like:
    /(?<foo>PAT1)bar|(?<foo>PAT2)baz/
    But you could make use of (?|) then:
    /(?|(?<foo>PAT1)bar|(?<foo>PAT2))/
    That is, if you have the same NAME repeated, return the leftmost capture
    regardless whether defined or not, and use (?| ) if you want the leftmost
    defined one.
    Alternatively, leave it as is, and have a way of getting to all the named
    captures, if if they share the same name. For instance after:
    my $re = qr /(?<integer>\d+)(?:\.(?<fraction>\d+))?/;
    if ("1 3.4" =~ /$re $re/) {
    print $+ {integer}, " ", $+ {fraction} // "UNDEF", "\n";
    }
    $+ {"integer"} eq '1' # Leftmost defined.
    $+ {"fraction"} eq '4' # Leftmost defined.
    $+ {"integer.1"} eq '1' # First capture of 'integer'
    $+ {"fraction.1"} eq undef # First capture of 'fraction'
    $+ {"integer.2"} eq '3' # Second capture of 'integer'
    $+ {"fraction.2"} eq '4' # Second capture of 'fraction'
    Another issue, the NAME of named captures have similar constraints on
    the name as identifiers - except that you cannot use '::' inside them.
    That's a pity because with '::' there would be an obvious way of using
    name spaces in your NAMEs.
    Abigail
    PGP SIGNATURE
    Version: GnuPG v1.4.0 (GNU/Linux)
    iYHljnk0GgtZifmKG9cWNQ==
    =Xx2J
    PGP SIGNATURE
  • No.1 | | 481 bytes | |

    09/02/07, Abigail <abigail (AT) abigail (DOT) bewrote:
    The advantage of allowing '::' is that you can create names
    using __PACKAGE
    --
    Hmmm.

    package Foo::Bar::Baz;

    "foo" =~ /(?<something>foo)/;

    print $+ {Foo::Bar::Baz} {something}; # Prints 'foo'.

    Hmmmmm. Another level of indirection ? (optional?) I don't like it
    very much, mostly because you can have several regexps in the same
    package.
  • No.2 | | 1046 bytes | |

    Fri, Feb 09, 2007 at 02:51:35PM +0100, Rafael Garcia-Suarez wrote:
    09/02/07, Abigail <abigail (AT) abigail (DOT) bewrote:
    >The advantage of allowing '::' is that you can create names
    >using __PACKAGE
    >
    >
    >Hmmm.
    >

    package Foo::Bar::Baz;

    "foo" =~ /(?<something>foo)/;

    print $+ {Foo::Bar::Baz} {something}; # Prints 'foo'.

    Hmmmmm. Another level of indirection ? (optional?) I don't like it
    very much, mostly because you can have several regexps in the same
    package.

    Yeah, and it won't work for what I intended it for anyway. I thought
    that it would nice with Regexp::Common as that it would then put all
    the captures in $+ {Regexp::Common::something} {}, but that would
    only work if such a binding happens at (regexp) compile time and not
    at run time.

    Abigail

    PGP SIGNATURE
    Version: GnuPG v1.4.0 (GNU/Linux)

    6IWVqcdotoc6kZiIz3KlH6k=
    =2t6z
    PGP SIGNATURE
  • No.3 | | 1209 bytes | |

    Fri, Feb 09, 2007 at 01:01:48PM +0100, Rafael Garcia-Suarez wrote:
    09/02/07, demerphq <demerphq (AT) gmail (DOT) comwrote:

    >Another issue, the NAME of named captures have similar constraints on
    >the name as identifiers - except that you cannot use '::' inside them.
    >That's a pity because with '::' there would be an obvious way of using
    >name spaces in your NAMEs.
    >
    >I'm open to modifications of the naming rule so long as it ensures
    >that names are easy to parse and can't be confused with integers
    >(possibly signed) so that \g{} stays unambiguous.


    So adding :: would be fine ? (or maybe a single char separator. But
    we've already got _.)

    The advantage of allowing '::' is that you can create names
    using __PACKAGE

    Hmmm.

    package Foo::Bar::Baz;

    "foo" =~ /(?<something>foo)/;

    print $+ {Foo::Bar::Baz} {something}; # Prints 'foo'.

    Now, that would be nice.

    Abigail

    PGP SIGNATURE
    Version: GnuPG v1.4.0 (GNU/Linux)

    yH2aGQtLSWBfPdk8cBbD2NQ=
    =R4A9
    PGP SIGNATURE

Re: Named captures.


max 4000 letters.
Your nickname that display:
In order to stop the spam: 7 + 6 =
QUESTION ON "Perl"

EMSDN.COM