Beefy Boxes and Bandwidth Generously Provided by pair Networks
Pathologically Eclectic Rubbish Lister
 
PerlMonks  

regex match word , don't match word preceeded by slash

by lepetitalbert (Monsignor)
on Nov 19, 2010 at 01:41 UTC ( #872404=perlquestion: print w/ replies, xml ) Need Help??
lepetitalbert has asked for the wisdom of the Perl Monks concerning the following question:

Hello monks

#!/usr/bin/perl use strict; use warnings; my $string = qq|<a href="/cgi-programming-with-perl.zip">cgi-programmi +ng-with-perl.zip</a>|; my $item = 'perl'; $string =~ s/(\s?[-\w\.]*(?<!\/)$item[-\w\.]*\s?)/<b>$1<\/b>/gi; print $string . "\n"; output : <a href="/<b>book-cgi-programming-with-perl.zip</b>"><b>book-cgi-progr +amming-with-perl.zip</b></a>

I'd like to match

book-cgi-programming-with-perl.zip

but not

/book-cgi-programming-with-perl.zip

I think I need a Lookbehind but after a long afternoon I have found no solution.

So any tip, hint, solution would be welcome.

Thanks !

Have a nice day

"There is only one good, namely knowledge, and only one evil, namely ignorance." Socrates

Comment on regex match word , don't match word preceeded by slash
Select or Download Code
Re: regex match word , don't match word preceeded by slash
by kcott (Abbot) on Nov 19, 2010 at 02:26 UTC

    I've expanded the solution so it's a bit easier to read:

    $string =~ s{ (?<![-\w./]) ( \s? [-\w.]* $item [-\w.]* \s? ) }{<b>$1<\/b>}gimsx;

    Test output:

    $ regex_slash_prob.pl <a href="/cgi-programming-with-perl.zip"><b>cgi-programming-with-perl. +zip</b></a>

    Update:

    While the solution above answers your question, consider the following.

    If the HTML actually looks more like this:

    my $string = qq| <a href="/cgi-programming-with-perl.zip"> cgi-programming-with-perl.zip </a> |;

    Your output will look like:

    <a href="/cgi-programming-with-perl.zip"> <b> cgi-programming-with-perl.zip </b> </a>

    If you'd prefer it to look like:

    <a href="/cgi-programming-with-perl.zip"> <b>cgi-programming-with-perl.zip</b> </a>

    Remove both \s? lines, leaving:

    $string =~ s{ (?<![-\w./]) ( [-\w.]* $item [-\w.]* ) }{<b>$1<\/b>}gimsx;

    You had them in your original so I left them in thinking they perhaps served some other purpose in the real data you're working on (as the string you posted contained no whitespace at all).

    -- Ken

      Hi kcott,

      works like a charm !
      thank you very much :)

      Have a nice day !

      "There is only one good, namely knowledge, and only one evil, namely ignorance." Socrates
Re: regex match word , don't match word preceeded by slash
by Anonymous Monk on Nov 19, 2010 at 02:50 UTC
    #!/usr/bin/perl -- use strict; use warnings; use HTML::TreeBuilder; my $html = '<html><body> <a href="/cgi-programming-with-perl.zip">cgi-programming-with-perl.zip +</a> <a href="cgi-programming-with-perl.zip">cgi-programming-with-perl.zip< +/a> </body></html>'; { my $tree = HTML::TreeBuilder->new(); $tree->ignore_ignorable_whitespace(0); $tree->no_space_compacting(1); $tree->parse( $html )->eof; $tree->look_down( qw' _tag a href ', qr!^/! , sub { $_[0]->push_content( HTML::Element->new('b')->push_content( $_[0]->detach_content ), ); return; }, ); print $tree->as_HTML('<>&',' ',{}), "\n"; } __END__ <html> <head> </head> <body> <a href="/cgi-programming-with-perl.zip"><b>cgi-programming-with-perl. +zip</b></a> <a href="cgi-programming-with-perl.zip">cgi-programming-with-perl.zip< +/a> </body> </html>

      Hi again kcott,

      I tried so many combinations I can't remember why those spaces where there ! thank you again

      I'm trying to undertand

      (?<![-\w./])

      so the / is the one preceeding the word
      but I don't get the -\w.
      if someone has 2 minutes left :)

      thank you too Anonymous Monk, I took a look at HTML::TreeBuilder but I wouldn't have found your solution in one afternoon ! ( is that english ? )

      Have a nice day !

      "There is only one good, namely knowledge, and only one evil, namely ignorance." Socrates

        In the href part, if you just have the slash in the lookbehind, the regex engine finds that gi-programming-with-perl.zip is a match and you end up with: /c<b>gi-programming-with-perl.zip</b>. By saying not a slash or any other character I'm trying to match, /cgi-programming-with-perl.zip does not match at all; the content, however, has a greater-than in that position (which is neither a slash nor a character your looking for, i.e. [-\w.]) so it does match. In my second example, the whitespace doesn't match [-\w./], so it works here also.

        -- Ken

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://872404]
Approved by kcott
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others musing on the Monastery: (13)
As of 2014-07-23 23:09 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    My favorite superfluous repetitious redundant duplicative phrase is:









    Results (154 votes), past polls