Beefy Boxes and Bandwidth Generously Provided by pair Networks
Pathologically Eclectic Rubbish Lister
 
PerlMonks  

Regex, how to pull out multiple matches per line into array?

by bulrush (Scribe)
on Aug 04, 2015 at 17:08 UTC ( [id://1137401]=perlquestion: print w/replies, xml ) Need Help??

bulrush has asked for the wisdom of the Perl Monks concerning the following question:

I call <this> a macro in my text file.

I need to find all matches to a regex and put each match from a single scalar, into one or more positions in an array. I'm trying to pull out strings that look like /<(ig|igt|igo|igxo);.+?>/. Basically I need to pull out macros that begin with "<ig;", "<igt;", "<igo;", "<igxo;" and end with ">". This is my attempt.

my($re,$lin,@g,@inarr); my(@inarr)=( 'stuff{tag}<igo;123>stuff<ig;abc>', '<igt;ddd>stuff blah {foo}', 'stuff blah foo <igxo;dsldkd.eps>', '<igt;aaa>stuff blah <igx;hhh>' ); $re='<(ig|igt|igo|igxo);.+?>'; foreach $lin (@inarr) { @g=($lin=~m/$re/g); } # foreach
After this runs on $inarr[0], @g should contain:
@g=('<igo;123>', '<ig;abc>' );
but I'm getting:
@g=('ig');
What am I doing wrong here? I've already searched Google but haven't found what I'm looking for yet.

Thank you.

Replies are listed 'Best First'.
Re: Regex, how to pull out multiple matches per line into array?
by toolic (Bishop) on Aug 04, 2015 at 17:23 UTC
    Non capturing groupings
    use warnings; use strict; use Data::Dumper; my ( $re, $lin, @g ); my (@inarr) = ( 'stuff{tag}<igo;123>stuff<ig;abc>', '<igt;ddd>stuff blah {foo}', 'stuff blah foo <igxo;dsldkd.eps>', '<igt;aaa>stuff blah <igx;hhh>' ); $re = '<(?:ig|igt|igo|igxo);.+?>'; foreach $lin (@inarr) { @g = ( $lin =~ m/$re/g ); print Dumper( \@g ); } # foreach __END__ $VAR1 = [ '<igo;123>', '<ig;abc>' ]; $VAR1 = [ '<igt;ddd>' ]; $VAR1 = [ '<igxo;dsldkd.eps>' ]; $VAR1 = [ '<igt;aaa>' ];
      I read the link you gave and now see what you did. Thanks.
Re: Regex, how to pull out multiple matches per line into array?
by choroba (Cardinal) on Aug 04, 2015 at 17:24 UTC
    The matching in list context returns the matching groups ($1, $2...), which are created by parentheses in the regular expression. You should add another pair of them that spans the whole macro, you should also turn the inner group into a non-matching one so it doesn't pollute the results:
    #!/usr/bin/perl use warnings; use strict; my @inarr = ( 'stuff{tag}<igo;123>stuff<ig;abc>', '<igt;ddd>stuff blah {foo}', 'stuff blah foo <igxo;dsldkd.eps>', '<igt;aaa>stuff blah <igx;hhh>', ); my $re = qr/( < (?: ig | igt | igo | igxo ) ; .+? > )/x; for my $lin (@inarr) { my @g = $lin =~ m/$re/g; print join ' ', map "[$_]", @g; print "\n"; }
    لսႽ† ᥲᥒ⚪⟊Ⴙᘓᖇ Ꮅᘓᖇ⎱ Ⴙᥲ𝇋ƙᘓᖇ
Re: Regex, how to pull out multiple matches per line into array?
by GrandFather (Saint) on Aug 04, 2015 at 20:59 UTC

    Don't declare all your variables in a single statement ($re, $lin, ...). That almost completely negates scope checking provided by strict and tell readers of the code nothing about the correct scope of variables. Instead declare each variable where it is first used:

    my (@inarr) = ( 'stuff{tag}<igo;123>stuff<ig;abc>', '<igt;ddd>stuff blah {foo}', 'stuff blah foo <igxo;dsldkd.eps>', '<igt;aaa>stuff blah <igx;hhh>' ); my $re = '<(ig|igt|igo|igxo);.+?>'; for my $lin (@inarr) { my @g = ($lin =~ m/$re/g); }

    Also note use of indentation (Perl Tidy is your friend) so it's easy to see how blocks are nested (and avoid the need to correctly comment } as an aid to matching braces).

    In particular note that @g is local to the for loop block. No simple way to tell that if you declare it globally to the loop.

    Even worse, $lin looks like a normal variable declared globally, but isn't. Loop variables are magical (they are aliased to each loop value in turn) and are not the same as a lexical variable global to the for loop that may share the same name.

    Premature optimization is the root of all job security

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://1137401]
Approved by toolic
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others romping around the Monastery: (7)
As of 2024-03-28 09:31 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found