Beefy Boxes and Bandwidth Generously Provided by pair Networks
go ahead... be a heretic
 
PerlMonks  

getting a regex to return an array of matches?

by rmexico (Novice)
on Feb 14, 2006 at 17:44 UTC ( [id://530176]=perlquestion: print w/replies, xml ) Need Help??

rmexico has asked for the wisdom of the Perl Monks concerning the following question:

is there a way to get a regex to return an array of instances? for ex.
undef $/; open INFILE, "<$file"; my @filecontents = <INFILE>; close INFILE; foreach my $filepieces (@filecontents) { if ($filepieces =~ /SOME_REGEX/smg) { # do something w/ match }
BUT --- what if that regex occurs more than once in the file? it'll match and stop at that instance. how do i get to more occurrances of what i'm hunting for if i'm slurping the whole file into a variable?

Replies are listed 'Best First'.
Re: getting a regex to return an array of matches?
by tirwhan (Abbot) on Feb 14, 2006 at 18:07 UTC

    Did you try it out?

    if(@match = $filepieces =~ /SOME_REGEX/smg) { # ... }

    All dogma is stupid.
      i did try this, which doesn't work:
      #!/usr/bin/perl use strict; use warnings; use Data::Dumper; use diagnostics; undef $/; my $file = "test.txt"; open INFILE, "<$file" or die "cant open file: $!\n"; my @arr = <INFILE>; close INFILE; foreach my $filepieces (@arr) { my @matches = $1 if ($filepieces =~ /alpha\|(.*?)\|/smg); if(@matches) { foreach my $match (@matches) { print "match: $match\n"; } } }
      and the file:
      alpha|beta|alpha|theta|alpha|gamma|alpha|episilon
      that's just an example, splitting on the pipe isn't an option, b/c my real files to parse don't look like that

        This will:

        #!/usr/bin/perl use strict; use warnings; use Data::Dumper; use diagnostics; undef $/; my $text = <DATA>; my @matches; if (@matches = $text =~ /alpha\|([^|]+)\|/smg) { foreach my $match (@matches) { print "match: $match\n"; } } __DATA__ alpha|beta|alpha|theta|alpha|gamma|alpha|episilon

        Output:

        match: beta match: theta match: gamma

        Or you could dispense with loading data into the array at all (unless you need it for later):

        #!/usr/bin/perl use strict; use warnings; use Data::Dumper; use diagnostics; undef $/; my $text = <DATA>; while ($text =~ /alpha\|([^|]+)\|/smg) { print "match: $1\n"; } __DATA__ alpha|beta|alpha|theta|alpha|gamma|alpha|episilon

        All dogma is stupid.

        Never do
        my $var = ... if ...;,
        my $var = ... unless ...;,
        my $var = ... for ...;,
        my $var = ... foreach ...; or
        my $var = ... while ...;

        Split them into two statements, like
        my $var; $var = ... if ...;.

        Update: Well, this is bothering me. I can't remember why, and I can't find an example where the above fails.

        Update: diotalevi, which I believe has a knowledge of Perl guts, mentions: "my() has a runtime effect. You *always* (unless you're doing something freakish) want that to happen. The my $var STATEMENT-MODIFIER allows the runtime part of my() to potentially not happen." In other words, it sounds like doing any of the above leaves perl in an unstable state.

Re: getting a regex to return an array of matches?
by swampyankee (Parson) on Feb 14, 2006 at 18:13 UTC

    Setting $/ to undef will ignore the end-of-line markers in the file, so @filecontents will be an array containing the entire file in one element. Your foreach loop has only one item to work on, and will exit after processing it.

    Try correcting that—by leaving $/ at its default value, so @filecontents has one line per element.


    added in update
    rmexico's (belated) explanation of why s?he's setting $/ undef tends to make my suggestion rather pointless.

    Since the entire contents of the file will be read into a scalar, tirwhan's solution to use a regex to get an array from the scalar is much closer to the OP's requirement.

    emc

    " When in doubt, use brute force." — Ken Thompson
      the reason that i'm undef'ing the file terminator, is b/c i'm parsing files that could have lines spanning more than one line, so the normal '\n' isn't applicable
        i'm parsing files that could have lines spanning more than one line

        I think a moment's thought will tell you there's a big problem with that statement!

        If you have "lines" which span more than one "line", then you're going to have to define for us what the word "line" means for you -- first at the start of the sentence and then what it means at the end of the sentence.

        To put it another way, perl is happy to let you decide what the end of a line looks like, but you don't appear to know yourself.



        ($_='kkvvttuu bbooppuuiiffss qqffssmm iibbddllffss')
        =~y~b-v~a-z~s; print
Re: getting a regex to return an array of matches?
by Praveen (Friar) on Feb 15, 2006 at 05:44 UTC
    Change this line i.e 'while' instead of 'if'
    while ($filepieces =~ /SOME_REGEX/smg) {

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://530176]
Approved by Corion
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others exploiting the Monastery: (3)
As of 2024-04-25 06:30 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found