Beefy Boxes and Bandwidth Generously Provided by pair Networks
No such thing as a small change
 
PerlMonks  

RegEx Matching, Loop Breaking...

by rardoe (Initiate)
on May 28, 2013 at 17:12 UTC ( #1035657=perlquestion: print w/ replies, xml ) Need Help??
rardoe has asked for the wisdom of the Perl Monks concerning the following question:

So I have the below code. I know there are some extra bits, but it is not behaving how i would like. basically I have a file i am reading in, then a second file read in, and looping through and saying that if a line in XML contains anything from a line in DOMAIN, set a variable to nothing and exit the innermost loop. But the problem is that nothing ever matches. If I replace the $cleanedDomain line with a literal, it matches. The XML file is xml and the domains file is just a list of domain names... It looks right to me but i think i am missing something.
use strict; use warnings; my $linexml; my $cleanedXML; my $cleanedDomain; open (XML, 'domainList.xml'); while (my $xmlline = <XML>) { $linexml = $xmlline; $cleanedXML = quotemeta $xmlline; open (DOMAINS, 'dead_domains.txt'); while (my $domainline = <DOMAINS>) { $cleanedDomain = quotemeta $domainline; if ($cleanedXML =~ m/$cleanedDomain/) { $linexml = ''; last;} } close (DOMAINS); open (MYFILE, '>>new_domainList.xml'); print MYFILE "$linexml"; close (MYFILE); } close (XML);

Comment on RegEx Matching, Loop Breaking...
Download Code
Re: RegEx Matching, Loop Breaking...
by toolic (Chancellor) on May 28, 2013 at 17:30 UTC
      chomp doesn't change anything, i've tried it with and without.
Re: RegEx Matching, Loop Breaking...
by Laurent_R (Parson) on May 28, 2013 at 17:50 UTC

    Asides from the fact that you need to chomp the data from your file, I think is is probably not a good idea to open, read and close a file for every single line of input of anoher file.

    You probably want to read your domain file only once at the beginning, clean and store its content in memory (in an array or a hash) for when you process your other file.

Re: RegEx Matching, Loop Breaking...
by hdb (Prior) on May 28, 2013 at 18:56 UTC

    Why do you need to "quotemeta" $xmlline? On the lefthand-side of the match this should not be needed.

    Without the files it is difficult to develop more ideas, the logic seems ok. I would add a print statement just before the match to see the contents of $cleanedDomain and $cleanedXML.

    I agree with comments above that you should revisit your open/close policy.

      Taking the quotemeta out fixed my issue. Thanks!

        The reason taking the quotemeta out fixed the problem is that a quotemeta-ed string that contained  \W characters will, in general (there may be some oddball corner cases), not regex-match with itself after the string has been double-quote-ishly interpolated into a regex. The reason is that after a string is quotemeta-ed, something like  '\-' in the quotemeta-ed string is literally a backslash-hyphen sequence of characters, but this sequence in a regex only matches a hyphen leaving the backslash in the quotemeta-ed string unmatched.

        >perl -wMstrict -le "my $s = 'a-b c*d'; my $qm_s = quotemeta $s; print qq{raw: '$s' quotemeta-ed: '$qm_s'}; ;; printf qq{%s equal \n}, $qm_s eq $qm_s ? '' : 'NOT '; printf qq{%s match \n}, $qm_s =~ /$qm_s/ ? '' : 'NO '; " raw: 'a-b c*d' quotemeta-ed: 'a\-b\ c\*d' equal NO match

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://1035657]
Approved by Corion
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others chilling in the Monastery: (5)
As of 2014-09-17 05:36 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    How do you remember the number of days in each month?











    Results (58 votes), past polls