Beefy Boxes and Bandwidth Generously Provided by pair Networks
more useful options
 
PerlMonks  

regex corrupting separate regex

by boxofrox (Initiate)
on Feb 12, 2010 at 22:04 UTC ( [id://822941]=perlquestion: print w/replies, xml ) Need Help??

boxofrox has asked for the wisdom of the Perl Monks concerning the following question:

I have consecutive perl statements that each evaluate a regex, and it appears that the result of the previous regex corrupts the result of the next regex.

Below are two test cases showing this unexpected response.
The first test case mimics the original code that introduced this corruption.
The second mimics the work around I used to move past this issue and get my code working, yet the corruption occurs if I continue after the search matches.

The output prints the regex with values for the name search, and then for the related version search.
The variable values are enclosed in brackets to identify all characters in the string.

Q: Can anyone explain why this is happening and perhaps what perl thinks it's doing?

Test Case 1
#!/usr/bin/perl -w use strict; # create sample task list my $tasks = [ [ "A", "1.1"], [ "B", "1.2"], [ "C", "2.4"], [ "D", "2.9"], [ "E", "1.3.1"], ]; my $task_name = "C"; # use to hold task name i'm search +ing for my $task_version = ""; # use to hold task version i'm sea +rching for my ($cache_name, $cache_version); # use to hold data from task list my ($match_name, $match_version); # use to hold result of regex matc +h foreach my $array ( @$tasks ) { $cache_name = $array->[0]; # get task name $cache_version = $array->[1]; # get task version # error occurs here $match_name = ($cache_name =~ /^$task_name$/i) ? "match" : "no match +"; $match_version = ($cache_version =~ /$task_version/i) ? "match" : "n +o match"; info( 0, 24, "[$cache_name] =~ /^$task_name\$/i", $match_name ); info( 4, 24, "[$cache_version] =~ /$task_version/i", $match_version +); } ## Info Subroutine ## sub info { my $indent = shift; my $width = shift() - $indent; my $msg = shift; my $result = shift; my $margin = " " x $indent; printf( "%s%-${width}s | %10s\n", $margin, $msg ? substr( $msg, 0, 60 ) : "", $result ? substr( $result, 0, 10 ) : "" ); } ######## OUTPUT ######## [A] =~ /^C$/i | no match [1.1] =~ //i | match [B] =~ /^C$/i | no match [1.2] =~ //i | match [C] =~ /^C$/i | match [2.4] =~ //i | no match <-- unexpected [D] =~ /^C$/i | no match [2.9] =~ //i | no match <-- unexpected [E] =~ /^C$/i | no match [1.3.1] =~ //i | no match <-- unexpected
Test Case 2
#!/usr/bin/perl -w use strict; # create sample task list my $tasks = [ [ "A", "1.1"], [ "B", "1.2"], [ "C", "2.4"], [ "D", "2.9"], [ "E", "1.3.1"], ]; my $task_name = "C"; # use to hold task name i'm search +ing for my $task_version = ""; # use to hold task version i'm sea +rching for my ($cache_name, $cache_version); # use to hold data from task list my ($match_name, $match_version); # use to hold result of regex matc +h foreach my $array ( @$tasks ) { $cache_name = $array->[0]; # get task name $cache_version = $array->[1]; # get task version # reversing order of regex works around this bug $match_version = ($cache_version =~ /$task_version/i) ? "match" : "n +o match"; $match_name = ($cache_name =~ /^$task_name$/i) ? "match" : "no match +"; info( 0, 24, "[$cache_name] =~ /^$task_name\$/i", $match_name ); info( 4, 24, "[$cache_version] =~ /$task_version/i", $match_version +); } ## Info Subroutine ## sub info { my $indent = shift; my $width = shift() - $indent; my $msg = shift; my $result = shift; my $margin = " " x $indent; printf( "%s%-${width}s | %10s\n", $margin, $msg ? substr( $msg, 0, 60 ) : "", $result ? substr( $result, 0, 10 ) : "" ); } ######## OUTPUT ######## [A] =~ /^C$/i | no match [1.1] =~ //i | match [B] =~ /^C$/i | no match [1.2] =~ //i | match [C] =~ /^C$/i | match [2.4] =~ //i | match <-- EXPECTED [D] =~ /^C$/i | no match [2.9] =~ //i | no match <-- unexpected [E] =~ /^C$/i | no match [1.3.1] =~ //i | no match <-- unexpected

Replies are listed 'Best First'.
Re: regex corrupting separate regex
by blokhead (Monsignor) on Feb 12, 2010 at 22:20 UTC
    From perlop (under "Quote-like operators"):
    The empty pattern //

    If the PATTERN evaluates to the empty string, the last successfully matched regular expression is used instead. In this case, only the g and c flags on the empty pattern is honoured - the other flags are taken from the original pattern. If no match has previously succeeded, this will (silently) act instead as a genuine empty pattern (which will always match).

    This is an odd "feature" of Perl, and it's very easy to get bitten by it in m/$pattern/. It's happened to me several times. Perhaps a good solution is m/(?:$pattern)/ or m/(?:)$pattern/ or something similar, so that the pattern is never empty.

    Update:

    $ perl -le 'print 1 if "foo" =~ /o/; print 2 if "asdf" =~ //' 1 $ perl -le 'print 1 if "foo" =~ /z/; print 2 if "asdf" =~ //' 2 $ perl -le 'print 1 if "foo" =~ /o/; print 2 if "asdf" =~ /(?:)/' 1 2

    blokhead

      Thank you, blokhead. That's what I trying to find in perlre.
Re: regex corrupting separate regex
by toolic (Bishop) on Feb 12, 2010 at 22:42 UTC
    blockhead has answered your question.

    A better approach is to compare your two name strings using eq and lc instead of a regex:

    $match_name = (lc($cache_name) eq lc($task_name$)) ? "match" : "no mat +ch";
    For your version comparison, you might consider using index instead of a regex. It is not clear what you are trying to do (why did you initialize the version to an empty string?).
      each task stores results, and tasks are run in groups called jobs. the perl utility i'm modifying:
      1. takes a job number,
      2. creates a Task List from all tasks in the job,
      3. reads a xml file that specifies which order tasks are to be read (<Task Name='A' Version='1.1'/>),
      4. searches the Task List for each task/version in the xml file,
      5. gets the results and appends them to a report file.

      I recently had to create jobs running older tasks. Instead of editing the xml file everytime I deviate to other versions, I want to leave the version attribute blank, and have the utility just find the task regardless of version; i only put one version of a task in each job, hence the empty version number.

      There's likely a better way to do this, but I'm new to perl and modifying a personal copy of a complicated utility someone else in the company wrote.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://822941]
Approved by Corion
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others perusing the Monastery: (5)
As of 2024-04-25 07:14 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found