Beefy Boxes and Bandwidth Generously Provided by pair Networks
Just another Perl shrine
 
PerlMonks  

More Regexp Confusion

by mdunnbass (Monk)
on Feb 01, 2007 at 21:57 UTC ( [id://597839]=perlquestion: print w/replies, xml ) Need Help??

mdunnbass has asked for the wisdom of the Perl Monks concerning the following question:

I posted a question similar to this one last week, and got some very helpful answers. But, since I didn't frame the question well, I didn't get the answers I was looking for

Scenario:
Big text file. maybe HTML, maybe plain text. Looking to replace all instances of $foo with $stuff1.$foo.$stuff2.

$foo is from uc(chomp($foo = <STDIN>));, and is only English letters. However, let's say $foo = 'GOAT'; the file I'm looking through doesn't have 'GOAT' anywhere, but it does have asdfGxxOxxAxxTqwerty and other permutations.

I want to make:

asdfG..O..A..Tqwerty

become:

'asdf'.$stuff1.'GxxOxxAxxT'.$stuf2.'qwerty'

And before I can mislead you or anything, I don't care if the xx's are xx or if the qwerty is qwerty. They could be anything matching [^A-Z]. I just want to ignore them and leave them undisturbed.

Essentially, I want to do this globally, and the only change I want to make to the text file is adding $stuff1 and $stuff2.

I've tried various things, all to abysmal and spectacular failure. I think it should be something along the following lines, but I know as is, this is wrong:

# disclaimer - assume use strict and warnings are both in effect, and # variables are lexically named elsewhere.. this is a toss off bit o' + code while (my $line = <FH>) { $test =~ join ( split ( /(?:[^A-Z])/, $line ) ); # the previous line will excise any non uppercase, non letters from $l +ine # now, we compare $line to our $foo while ($test =~ /$foo/g ) { $line =~ s/$1/(stuff1 . $1 . $stuff2)/e } }

Of course, I recognise that the $1 there correspoonds to the GOAT and not the GxxOxxAxxT, but I am at a loss as to how to proceed. Any thoughts? I know that many people responded to my OP suggesting something like:

my $x = <STDIN>; chomp $x; $x = uc($x); my $regex = join('[^A-Z]*', split //, $x);

..and that the method I proposed above is more of the inverse...

I just wasn't able to get the original suggestions to work for me, and I attributed it to me not describing the problem well enough.

Thanks for any insights
Matt

Replies are listed 'Best First'.
Re: More Regexp Confusion
by GrandFather (Saint) on Feb 01, 2007 at 22:22 UTC

    On the face of it

    use strict; use warnings; my $str = 'asdfGxxOxxAxxTqwerty'; my $foo = 'GOAT'; my @letters = split //, $foo; my $match = join '.*', @letters; $str =~ s/($match)/!$1!/g; print $str;

    Prints:

    asdf!GxxOxxAxxT!qwerty

    does what you want. But I can't help thinking that you are still asking the wrong question. What is the problem you are trying to solve? You have asked how to implement a solution, but we may be able to help more if we know what the actual problem is.


    DWIM is Perl's answer to Gödel
Re: More Regexp Confusion
by johngg (Canon) on Feb 01, 2007 at 23:54 UTC
    This seems to do what I think you want using your data.

    use strict; use warnings; my @strings = ( q{asdfgGOATaewrjgn}, q{ererGOAskjgbrTslkgjnjng}, q{ccG<!-- comment on status of global geopolitical economics -->OAj +gTsvs}, q{aGO<h3>i hate clowns</h3>ATbbb}); my $needle = q{GOAT}; my @parts = split m{}, $needle; my $notNeedle = qq{[^$needle]*}; my $haystackPatt = q{(} . join($notNeedle, @parts) . q{)}; my $rxHaystack = qr{$haystackPatt}; my $stuff1 = q{^^^}; my $stuff2 = q{+++}; foreach my $string ( @strings ) { print qq{$string\n}; $string =~ s{$rxHaystack}{$stuff1$1$stuff2}; print qq{$string\n\n}; }

    It produced this output.

    asdfgGOATaewrjgn asdfg^^^GOAT+++aewrjgn ererGOAskjgbrTslkgjnjng erer^^^GOAskjgbrT+++slkgjnjng ccG<!-- comment on status of global geopolitical economics -->OAjgTsvs cc^^^G<!-- comment on status of global geopolitical economics -->OAjgT ++++svs aGO<h3>i hate clowns</h3>ATbbb a^^^GO<h3>i hate clowns</h3>AT+++bbb

    I hope this is useful.

    Cheers,

    JohnGG

    Update: Fixed typos

    Update 2: Changed initialisation of $notNeedle so string was not hard coded, original line was my $notNeedle = q{[^GOAT]*};

      Thanks so much!

      I tried this code over the weekend, and it definitely does what I wanted it to. I made a few minor modifications, to suit my needs more specfically of course, but otherwise, you hit the nail on the head.

      Thanks again to everyone who helped!
      Matt

Re: More Regexp Confusion
by AltBlue (Chaplain) on Feb 01, 2007 at 22:15 UTC
    What do you mean by "xx"? A "constant" string? For example, if $foo = 'GOAT', you mean to match strings like:
    G..O..A..T
    GaaOaaAaaT
    GabcOabcAabcT
    G<<<<<O<<<<<A<<<<<T
    
    If this assumption is correct, this is a possible way (yet quite inefficient) to handle it:
    my @foo = split //, $foo; $foo = shift(@foo) . q{(.*?)} . join q{\2}, @foo; while ( my $line = <FH> ) { s/($foo)/$stuff1$1$stuff2/g; }
      Actually, it's not a constant string. I was trying to say in the OP that I didn't care what the 'xx' was, but I guess I didn't explain that very well.

      In reality, most likely, it'll take the form of possible HTML tags. I say possible, because there most likely will be none, but there might be some, and if they're there, they may be either opening or closing tags, or even tags with multiple attributes. I do know that any text will be lowercase. So, for instance, given $foo = 'GOAT', I want all of the following to match:

      asdfgGOATaewrjgn ererGOAskjgbrTslkgjnjng ccG<!-- comment on status of global geopolitical economics -->OAjgTsvs aGO<h3>i hate clowns</h3>ATbbb

      And in the last example, the output that I want would be:

      'a' . $stuff1 . 'GO<h3>i hate clowns</h3>AT' . $stuff2 . 'bbb'
      I hope that helps.

      Matt

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://597839]
Approved by GrandFather
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others drinking their drinks and smoking their pipes about the Monastery: (2)
As of 2025-07-10 10:56 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found

    Notices?
    erzuuliAnonymous Monks are no longer allowed to use Super Search, due to an excessive use of this resource by robots.