Beefy Boxes and Bandwidth Generously Provided by pair Networks
Syntactic Confectionery Delight
 
PerlMonks  

Regex Confusion

by Kelly22 (Novice)
on Feb 10, 2010 at 14:26 UTC ( #822426=perlquestion: print w/ replies, xml ) Need Help??
Kelly22 has asked for the wisdom of the Perl Monks concerning the following question:

I have been working on a a perl script for awhile and everything is going well other then I am constantly confused by how the regexs work.

It is my understanding that a line like string =~ s/"//; should replace all occurrences of " with nothing. However the strings it acts on usually have 2 ", and to eliminate both of them I have resorted to repeating the line.

Another series of regexs are not working as I expect them to either.

foreach (@content) { if ($content[$count] =~ /entertainment_headlines.gif/ or $content[ +$count] =~ /Chewy Headline Snacks/ or $content[$count] =~ /FEATURED INK/ or $content[$count] =~ / +StatCounter/ or $content[$count] =~ /NW Insider/ or $content[$count] =~ /twitterbadge22/){ splice(@content, $count, 1); } $count++; }
I believe this should elimiate array entries that contain these strings, but it doent seem two work msot of the time. It doesnt eliminate the array entry that ends with
</table> </div> <div class="moduletable"> <h3> FEATURED INK </h3> <table class="contentpaneopen">

or the one that contains

<!-- Start of StatCounter Code --> <br /><script type="text/javascript"> var sc_project=3028426; var sc_invisible=0; var sc_partition=32; var sc_security="0f2a9250"; </script> <script type="text/javascript" src="http://www.statcounter.com/counter +/counter_xhtml.js"></script><noscript><div class="statcounter"><a cla +ss="statcounter" href="http://www.statcounter.com/"><img class="statc +ounter" src="http://c33.statcounter.com/3028426/0/0f2a9250/0/" alt="f +ree html hit counter" /></a></div></noscript> <br /><!-- End of StatCounter Code -->

And while I am at it I also want to elimiate all newline characters from a string and assumed that $string =~ s/\n//; would do that but it doesnt seem to work at all.

Clearly I am not getting something about regexs. I have read up on them and its still not clear to me what I'm doing wrong. Any help or generally some negative reinforcement would be greatly appreciated.

Comment on Regex Confusion
Select or Download Code
Re: Regex Confusion
by biohisham (Priest) on Feb 10, 2010 at 14:34 UTC
    is your 'string' a valid variable?, because it is not preceded by a sigil, to account for deleting more than one double quote you can "$string =~ s/"+//;"

    see quantifiers and matching repitions

    UPDATE: The g modifier is more robust so toolic's solution is better.



    Excellence is an Endeavor of Persistence. Chance Favors a Prepared Mind.
Re: Regex Confusion
by toolic (Chancellor) on Feb 10, 2010 at 14:34 UTC
    It is my understanding that a line like string =~ s/"//; should replace all occurrences of " with nothing.
    That is incorrect. You need to use the g modifier to globally replace all quotes (s)
    $string =~ s/"//g;
    Your code only replaces the 1st quote.
Re: Regex Confusion
by Ratazong (Prior) on Feb 10, 2010 at 14:40 UTC

    I also want to eliminate all newline characters from a string

    use chomp use the solution from toolic below. (I should have read the text more carefully ... sorry)
      chomp does not remove all newlines from a string, only the last newline. This removes all newlines:
      $string =~ s/\n//g;
      More precisely:
      This safer version of "chop" removes any trailing string that corresponds to the current value of $/ (also known as $INPUT_RECORD_SEPARATOR in the English module).
        Thank you, I realy should have known that, but now I should never forget.
Re: Regex Confusion
by ysth (Canon) on Feb 10, 2010 at 14:45 UTC
    You shouldn't add or remove elements from an array you are looping over with foreach. Fortunately, you aren't actually using the loop variable $_; even so, you will be accidentally skipping over the element after one you've spliced away, since it will move into the $count position just as you move on to look at $count+1.

    Instead, do:

    for my $count ( reverse 0..$#content ) { # do stuff with $content[$count], including splicing it out of exis +tence }
    --
    A math joke: r = | |csc(θ)|+|sec(θ)|-||csc(θ)|-|sec(θ)|| |
    Online Fortune Cookie Search
    Office Space merchandise
      That makes a lot of sense, and explains my confusion concerning why it seemed to work sometimes. Thank you. And thank you to everyone for your rapid responses.
Re: Regex Confusion
by jwkrahn (Monsignor) on Feb 10, 2010 at 15:12 UTC

    Regular expressions are normally used to match patterns but if you just want to remove single characters it is more efficient to use the tr operator:

    $string =~ tr/"//d; $string =~ tr/\n//d;

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://822426]
Approved by Corion
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others browsing the Monastery: (7)
As of 2014-07-30 22:29 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    My favorite superfluous repetitious redundant duplicative phrase is:









    Results (241 votes), past polls