Beefy Boxes and Bandwidth Generously Provided by pair Networks
Problems? Is your data what you think it is?
 
PerlMonks  

A regex question.

by FeistyLemur (Acolyte)
on Jul 31, 2015 at 21:49 UTC ( [id://1137070]=perlquestion: print w/replies, xml ) Need Help??

FeistyLemur has asked for the wisdom of the Perl Monks concerning the following question:

This is my first post here, I have a question about something I can't explain.

I'm pretty sure I know what I did wrong, I'm just not 100% sure why it was wrong and wanted to clarify to better my understanding of regex in perl, as I'm fairly inexperienced and looking to improve.

I was searching for vlan ids for removal in the output of "ip addr" in the following way.

my $ethernet=`ip addr`; if ($ethernet!~/eth1\.\d{4}\@/gm){ print "No vlans exist on device.\n"; exit; } while ($ethernet=~/(eth1\.\d{4})\@/gm) { print `ip link del $1\n"; print "Removed $1\n"; }

Doing this as above mostly worked, but it would always miss the first entry for no reason I can explain. So if there were 25 vlan entries in the string from 1001 to 1025 it would match and delete 1002-1025 without fail, and miss 1001 every time.

Changing line 2 to:

if ($ethernet!~/eth1\.\d{4}\@/){

Does what I intended to do.

I had the same problem with a similar line I was using to scrub IP addresses off the vlan devices again missing the first match, I just don't understand why the while loop breaks in the way it does because of the /gm flag on the preceeding if statements match, and was hoping someone could explain. Thanks in advance.

Replies are listed 'Best First'.
Re: A regex question.
by AnomalousMonk (Archbishop) on Jul 31, 2015 at 22:00 UTC

    Because you're using the  /g regex modifier for all matches, the string match position never resets to the start of the string, but advances for each and every match performed. (Update: I should have written: advances for each and every successful match performed.)

    c:\@Work\Perl\monks>perl -wMstrict -le "my $s = 'xx111yy22xx333yy44zz555xx666'; ;; print qq{first capture: '$1'} if $s =~ m{ (\d+) }xmsg; ;; print qq{subsequent captures: '$1'} while $s =~ m{ (\d+) }xmsg; " first capture: '111' subsequent captures: '22' subsequent captures: '333' subsequent captures: '44' subsequent captures: '555' subsequent captures: '666'
    The fact that the first match in the OPed code is a negative | negated match (so no action is taken) does not matter; it's still a  /g match against the same string. Eliminating the  /g modifier on the first match will cause string match position to be reset before subsequent matches (which must still use /g).

    Note that use of the  /g modifier in an
        if ($string =~ m{ ... }xmsg) { ... }
    statement is useless except as a way to introduce exactly the behavior you are seeing!

    Update: A number of incremental updates made.


    Give a man a fish:  <%-(-(-(-<

      Hm, well I really didn't understand that about how the match operator works so I guess that would explain it. I had assumed /g was pointless on the if after thinking about it, but the part I really couldn't figure out was why the negative match would affect the subsequent one in any way.

        Hello FeistyLemur, and welcome to the Monastery!

        See the section “Global matching” in perlretut#Using-regular-expressions-in-Perl, and in particular note the following (emphasis added.):

        In scalar context, successive invocations against a string will have //g jump from match to match, keeping track of position in the string as it goes along....
        A failed match or changing the target string resets the position.... The current position in the string is associated with the string, not the regexp.

        Hope that helps,

        Athanasius <°(((><contra mundum Iustus alius egestas vitae, eros Piratica,

        ... the negative match ...

        It's important to understand that the negation is made on the result of the  /eth1\.\d{4}\@/gm match against the  $ethernet string. The match itself is actually successful! It's as if the
            if ($ethernet !~ /eth1\.\d{4}\@/gm) { ... }
        statement had been written as
            if ( ! ($ethernet =~ /eth1\.\d{4}\@/gm)) { ... }
        (note =~ vice !~). In fact, that's pretty much the way Perl sees the code:

        c:\@Work\Perl\monks>perl -wMstrict -MO=Deparse,-p -le "my $ethernet=`ip addr`; if ($ethernet!~/eth1\.\d{4}\@/gm){ print qq{No vlans exist on device.\n}; exit; } " BEGIN { $^W = 1; } BEGIN { $/ = "\n"; $\ = "\n"; } use strict 'refs'; (my $ethernet = `ip addr`); unless (($ethernet =~ /eth1\.\d{4}\@/gm)) { print("No vlans exist on device.\n"); exit; } -e syntax OK

        I had assumed /g was pointless on the if ...

        Given what you were trying to do, it was pointless, but it was not without effect!


        Give a man a fish:  <%-(-(-(-<

Re: A regex question.
by Aldebaran (Curate) on Aug 01, 2015 at 08:52 UTC

    It seems to me that your problem is in your control mechanisms, specifically that your if statement should be inside of a while statement that goes through your data line by line. Also maybe a global variable to keep track of matches. If you're reading a file, then you'll have something like:

    my $matches = 0; open FILE, ">filename.txt" or die $!; while (my $ethernet = <FILE>) {
    if ($ethernet=~/eth1\.\d{4}\@/m){ ## do something matches++; } }

      This won't work because you are opening the filehandle only for writing. In fact it will clobber the contents of the file.

      It should be:

      # open FILE, ">filename.txt" or die $!; #output only, and deprecated s +yntax open my $file, '<', 'filename.txt' or die $!; while (my $ethernet = <$file>) { ## do something }
      The way forward always starts with a minimal test.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://1137070]
Approved by Paladin
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others learning in the Monastery: (4)
As of 2024-03-19 06:47 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found