Beefy Boxes and Bandwidth Generously Provided by pair Networks
Your skill will accomplish
what the force of many cannot
 
PerlMonks  

RegEx Headaches

by oryx3 (Novice)
on Jun 19, 2013 at 17:39 UTC ( #1039811=perlquestion: print w/ replies, xml ) Need Help??
oryx3 has asked for the wisdom of the Perl Monks concerning the following question:

Here's what I've got:

DB<30> x $_ 0 'ActionLogs.1.1998.xml' DB<31> x /(\d+\.)+.*xml/g 0 1998. DB<32> x /(\d+)+.*xml/g 0 1

Here's what I want:

0 1 1 1998

How do I get from what I've got to what I want?

Comment on RegEx Headaches
Select or Download Code
Re: RegEx Headaches
by 5mi11er (Deacon) on Jun 19, 2013 at 18:01 UTC

      Yes, you answered the question correctly. Unfortunately, it was the wrong question. (That's my fault, not yours.)

      I should have added: the filename can have _one or more_ digit groups separated by periods in the middle. I want to extract them ALL!

      So, for example:

      ActionLogs.1.2.3.4.5.6.7.8.9.xml

      Should yield

      0 1 1 2 2 3 3 4 4 5 5 6 6 7 7 8 8 9

      I thought the g would do that, but I guess not.

        One way to do it.

        DB<1> $_ = 'ActionLogs.1.2.3.4.5.6.7.8.9.xml' DB<2> x split ' ', tr/.[a-zA-Z]/ /dr 0 1 1 2 2 3 3 4 4 5 5 6 6 7 7 8 8 9 DB<3>

        Update: The brackets are unnecessary; tr/.a-zA-Z/ /dr works just as well.

Re: RegEx Headaches
by bart (Canon) on Jun 19, 2013 at 18:41 UTC
    You can do
    @numbers = /(\d+)(?=\.)(?=.*\.xml$)/g
    This uses lookahead, which will match the rest of the string, but not consume it.

    If you don't need the extra check, you can just do

    @numbers = /(\d+)/g

      Hmmm, interesting. Now I've got:

      DB<23> x $_ 0 'ActionLogs.1.2.3.4.5.6.7.8.9.xml' DB<24> x /(\d+\.)/g 0 1. 1 2. 2 3. 3 4. 4 5. 5 6. 6 7. 7 8. 8 9.

      but:

      DB<25> x /(\d+\.)+/g 0 9.

      Can anyone tell me why the quantifier makes it match only the last group?

        The quantifier actually makes it match every group; however, it only stores the last match in $1, and thus that's what you see in your return. See Extracting matches in perlretut.

        If you want to extract an arbitrary number of digits sandwiched between decimal points, you can grab them all, and then split the result.

        /((?:\d+\.)+)/g; split /(?<=\.)/, $1;
        Alternatively,
        [i]n list context, //g returns a list of matched groupings, or if there are no groupings, a list of matches to the whole regexp
        (see Global matching in perlretut) so you could try
        my @res = /(?<=\.)\d+\./g;

        #11929 First ask yourself `How would I do this without a computer?' Then have the computer do it the same way.

Re: RegEx Headaches
by kcott (Abbot) on Jun 22, 2013 at 21:06 UTC

    G'day oryx3,

    Welcome to the monastery.

    I see a number of solutions that appear to be more complicated than necessary.

    You've provided two pieces of sample input with expected output for each. In both cases, either of these will achieve what you want:

    /\.(\d+)/g /(\d+)\./g

    Here's my test:

    $ perl -Mstrict -Mwarnings -de 1 Loading DB routines from perl5db.pl version 1.39_09 Editor support available. Enter h or 'h h' for help, or 'man perldebug' for more help. main::(-e:1): 1 DB<1> $_ = 'ActionLogs.1.1998.xml' + DB<2> x /\.(\d+)/g + 0 1 1 1998 DB<3> x /(\d+)\./g + 0 1 1 1998 DB<4> $_ = 'ActionLogs.1.2.3.4.5.6.7.8.9.xml' + DB<5> x /\.(\d+)/g + 0 1 1 2 2 3 3 4 4 5 5 6 6 7 7 8 8 9 DB<6> x /(\d+)\./g + 0 1 1 2 2 3 3 4 4 5 5 6 6 7 7 8 8 9 DB<7> q

    -- Ken

      ... more complicated than necessary.

      Since the decimal digit groups appear to be unambiguously delimited to begin with, no need to worry about a delimiter or capture group at all:

      >perl -wMstrict -le "$_ = 'ActionLogs.1.22.333.4.5.6.7.8.987.xml'; ;; my @digit_groups = m{ \d+ }xmsg; printf qq{'$_' } for @digit_groups; " '1' '22' '333' '4' '5' '6' '7' '8' '987'

        ++ Yes, that works and is less complicated still. :-)

        It wouldn't have occurred to me not to use a capture group. I checked the online docs and found in perlretut - Using regular expressions in Perl - Global matching (after following links from perlre):

        In list context, //g returns a list of matched groupings, or if there are no groupings, a list of matches to the whole regexp. [my emphasis]

        That seemed new to me (those links are for 5.16.2), so I checked back to the earliest online perldoc version (5.8.8) and, while in a different manpage (http://perldoc.perl.org/5.8.8/perlop.html#Regexp-Quote-Like-Operators) with different wording, that behaviour was current back then:

        In list context, it returns a list of the substrings matched by any capturing parentheses in the regular expression. If there are no parentheses, it returns a list of all the matched strings, as if there were parentheses around the whole pattern. [my emphasis, again]

        I've learned something new. Thankyou.

        -- Ken

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://1039811]
Approved by greengaroo
Front-paged by greengaroo
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others studying the Monastery: (7)
As of 2014-09-15 10:48 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    My favorite cookbook is:










    Results (146 votes), past polls