Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl Monk, Perl Meditation

Re: How to get ($1, $2, ...)?

by wfsp (Abbot)
on Feb 16, 2007 at 15:07 UTC ( #600452=note: print w/replies, xml ) Need Help??

in reply to How to get ($1, $2, ...)?

Perhaps put your reg exes in a loop an array?
#!/usr/bin/perl use strict; use warnings; use Data::Dumper; my @res = ( qr/Title: (.*?), Author: (\w+) (\w+)$/, qr/Title: (.*?), Author: (\w+) (\w+) Publisher: (\w+)$/, qr/Title: (.*?), Author: (\w+) (\w+) Publisher: (\w+) Year: (\w+)$/, ); my @answers; while (my $line = <DATA>){ for my $re (@res){ my @results; if (@results = $line =~ /$re/){ push @answers, [@results]; } } } print Dumper \@answers; __DATA__ Title: The Moor's Last Sigh, Author: Salman Rushdie Title: The God of Small Things, Author: Arundhati Roy Title: one, Author: two three Publisher: four Title: five, Author: six seven Publisher: eight Year: nine
$VAR1 = [ [ 'The Moor\'s Last Sigh', 'Salman', 'Rushdie' ], [ 'The God of Small Things', 'Arundhati', 'Roy' ], [ 'one', 'two', 'three', 'four' ], [ 'five', 'six', 'seven', 'eight', 'nine' ] ];
tinkered with the format of the output
update 2:
forgot to update the code. :-( Thanks to Tanktalus for spotting it.

Replies are listed 'Best First'.
Re^2: How to get ($1, $2, ...)?
by Anno (Deacon) on Feb 16, 2007 at 16:28 UTC
    Reproducing a bit of your code:
    my @answers; while (my $line = <DATA>){ for my $re (@res){ my @results; if (@results = $line =~ /$re/){ push @answers, ["@results"];
    Why the quotes around @results? They weren't in the version that produced the output you're showing.
    } } }
    You're also making an unnecessary copy of the array @results. Its scope is the loop body, so you have a new one each time through. Just take the reference:
    # ... for my $re (@res){ my @results; push @answers, \ @results if @results = $line =~ $re; } # ...


      Oh, or even
      # ... push @answers, grep @$_, map [ $line =~ $_], @res; # ...
      instead of the for loop over @res.

      I realize I'm expanding on a non-solution to the original question. It's art for art's sake, if that's allowed.


Re^2: How to get ($1, $2, ...)?
by ferreira (Chaplain) on Feb 16, 2007 at 16:15 UTC

    That won't do. I am interested in the order of the regexes and in resuming from where other left. If one uses /$re/, the search will be reset each time. In turn, with /$re/gc I may write code to look for things such as /Title: (.*?)$/, Author: (.*?), and Publisher: (.*?), but will not accept if they come out of order (like "Publisher... Title... Author...").

    I have been thinking that I should have phrased this question differently, asking directly for a way to get ($1, $2, ...) in a generic manner and then showing the code for sub _groups. The background that inspired me to formulate the problem could be added as a complement, without obscuring what I was looking for.

      If you really need to hang on to the //gmc regex construct then you could opt to include the regex's as alternatives. Afterwards split the grouped result per regex based on field position in the group.
      - note that the order of the alternatives influences which one will match first each time (and that's what you wanted right?)
      - since the total regex is just one expression your program will examine the text only once -> performance gain

      See below for an example to get the idea.

      #!/usr/bin/perl use strict; use warnings; my $text = <<TEXT; Title: The Moor's Last Sigh, Author: Salman Rushdie Title: The God of Small Things, Author: Arundhati Roy Title: A very special title, Author: varianf varians TEXT my @answers; my $re = qr/Title: (.*?), Author: (\w+) (\w+)$/; # 3 groups here my $re2= qr/Title: (.*?special.*?), Author: (\w+) (\w+)$/; my (@MatchAll) = ($text =~ /$re2|$re/mgc); my (@Match1,@Match2); for (my $i=0;$i<@MatchAll;$i=$i+6) { defined $MatchAll[$i] && push @Match2, $MatchAll[$i..$i+2]; defined $MatchAll[$i+3] && push @Match1, $MatchAll[$i+3..$i+5]; } Output: $ perl .$VAR1 = [ 'A very special title', 'varianf', 'varians' ]; $VAR1 = [ 'The Moor\'s Last Sigh', 'Salman', 'Rushdie', 'The God of Small Things', 'Arundhati', 'Roy' ];
      P.S.: I hardcoded the boundaries for the captured fields to shortcut the coding here. Naturally this part could/should be coded more flexible if you deal with a lot of regex's.

      Since he is comparing line by line instead of the whole doc all at once, it doesn't matter that the next regex starts at the begging even if the last one matched. I know that often my problem isn't getting perl to do what i want, it is thinking i want perl to do one thing when realy there is a better solution. That's why it is good you provide your actual problem because someone might see a solution you are missing, or at very least the insight into the problem will allow people to agree you are doing it the best way, either way you get good information!

      Eric Hodges
      I'd try a combination of m//g in scalar context and using the \G marker. If necessary, you can control where it matches by setting pos().

      Sorry for not presenting a coded solution, I don't understand your problem well enough to give one.


Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://600452]
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others cooling their heels in the Monastery: (5)
As of 2020-01-23 01:34 GMT
Find Nodes?
    Voting Booth?