Re: regex: something...(!something)...something
by Roy Johnson (Monsignor) on Jul 17, 2008 at 19:41 UTC
|
My tutorial on look-ahead and look-behind covers this.
You're specifying that the open tag is not followed immediately by another open tag; you want to specify that it's followed by any arbitrary text that isn't an open tag. This s/// expression should do what you want:
$x =~ s{(?<=<td>)((?:(?!<td>).)*)3((?:(?!</td>).)*)(?=</td>)}{$1a$2};
Update: I should have broken this down in commented style (I also note that I could have done lookahead for everything after the 3):
$x =~ s{(?<=<td>) # Start match with open-tag
( # Capture
(?:(?!<td>).)* # Any number of characters that do not star
+t an open-tag
)3 # Close capture; match literal 3
(?= # Look ahead to match
(?:(?!</td>).)* # Any number of characters that do not star
+t a close-tag
</td> # then a close-tag
)} # End lookahead and pattern
{$1a}x;
Caution: Contents may have been coded under pressure.
| [reply] [d/l] [select] |
|
Thank you! It also works (I see it is more complex then Oluses regex, but probably more universal).
I'll read your tutorial and will try to never ask again stupid questions about regexes :)
| [reply] |
Re: regex: something...(!something)...something
by GrandFather (Sage) on Jul 17, 2008 at 21:09 UTC
|
For markup use the appropriate parsing module. That will save you a pile of time and substantive questions won't get clouded by the "but you should be using a module" answer - you've preempted that answer and move on to real issues.
So, lets look at the real issue without the distraction of HTML, XML, ... . Consider:
use strict;
use warnings;
my $match1 = 'something ';
my $fill = 'xxxxxx ';
my $match2 = 'somethingelse ';
my $nomatch = 'nomatch ';
my @targets = (
"$nomatch$fill$match1$match2$nomatch",
"$nomatch$fill$match1$fill$match2$nomatch",
"$nomatch$fill$match1$fill$nomatch$fill$match2$nomatch",
);
for my $test (@targets) {
if ($test =~ /$match1 ((?:(?!$nomatch) .)*) $match2/x) {
print "Matched >$1< for: $test\n";
} else {
print "Failure for $test\n";
}
}
Prints:
Matched > < for nomatch xxxxxx something somethingelse nomatch
Matched > xxxxxx < for nomatch xxxxxx something xxxxxx somethingelse n
+omatch
Failure for nomatch xxxxxx something xxxxxx nomatch xxxxxx somethingel
+se nomatch
The trick is the (?:(?!$nomatch) .)* which will only match a character if it is not the start of the nomatch criteria.
Perl is environmentally friendly - it saves trees
| [reply] [d/l] [select] |
|
I begin to understand the idea of trick) thank you.
In my case the regex must be more complex, I'll soon post the solution in my first post
| [reply] |
Re: regex: something...(!something)...something
by olus (Curate) on Jul 17, 2008 at 17:31 UTC
|
use strict;
use warnings;
my @lines = <DATA>;
my $data;
foreach $data (@lines) {
if($data =~ /sometext (?!not)\w* endtext/) {
print "$data passed \n";
}
}
__DATA__
sometext not endtext
sometext positive endtext
outputs
sometext positive endtext
passed
| [reply] [d/l] [select] |
|
Thanks, I see it works, but with simple moments... please look above (I gave snippet of my code where some bug inside) - maybe you will be able to correct it.
| [reply] |
Re: regex: something...(!something)...something
by olus (Curate) on Jul 17, 2008 at 18:35 UTC
|
man, you are hard to please
use strict;
use warnings;
my $text="<tr><td>1</td><td>2</td><td>qw<font>3</font></td><td>4</td><
+tr>";
$text =~ s/<td>((?:(?!<td>).)*)3((?:(?!<\/td>).)*)/$1a$2/;
print "$text";
outputs
<tr><td>1</td><td>2</td>qw<font>a</font></td><td>4</td><tr>
| [reply] [d/l] [select] |
|
$text =~ s/(<td>)((?:(?!<td>).)*)3((?:(?!<\/td>).)*)/<td>$1$2a$3/;
Yes! Thank you! It is what I wanted so much :) *I corrected your code a bit, and it seems to work well* | [reply] [d/l] |
Re: regex: something...(!something)...something
by pileofrogs (Priest) on Jul 17, 2008 at 17:24 UTC
|
/^somebegin(?!something)someend$/
Should work. The only complexity involves lookahead vs. backtrack and because you have stuff both before and after the stuff you don't want, that won't matter (in terms of the truth of the statement, I have no idea about the efficiency).
Personally, if I'm confused by something like this I opt for the slow but readable...
if ( /^somebegin(.*)someend$/ ) {
my $middle = $1;
if ( $middle !~ /^something$/ ) {
# woot
}
}
| [reply] [d/l] [select] |
|
hey! it is really not flexible.
I'll give more complex question:
there is some html.
<tr><td>1</td><td>2</td><td>3</td><td>4</td><tr>
OR with <font> - the task is that it cannot be or there can be somethi
+ng else
<tr><td>1</td><td>2</td><td><font>3</font></td><td>4</td><tr>
I need to do something like this:
replace <td>..(!<td>)..3..(!</td>)..</td>
with <td>(everything that was in the left middle before 3)TEXT(everyth
+ing that was in the right middle after 3)</td>
Hope, you understood me.. | [reply] [d/l] [select] |
|
use strict;
use warnings;
my $text="<tr><td>1</td><td>2</td><td>qw<font>3</font></td><td>4</td><
+tr>";
my @blocks = split /<td>(.*?)<\/td>/, $text;
foreach my $block (@blocks) {
if($block =~ /3/) {
print $block."\n";
}
}
outputs
qw<font>3</font>
But you should consider one of the many HTML parser modules. | [reply] [d/l] [select] |
|
hm.. I see this looks like ugly but working solution and with some modifications I'll use it... but it is so sad there is no any commmon regex to solve the problem (as I wrote in some other post, such problems I met not only with HTML..)
| [reply] |
|
Re: regex: something...(!something)...something
by poolpi (Hermit) on Jul 18, 2008 at 12:39 UTC
|
#!/usr/bin/perl -w
use strict;
use HTML::TreeBuilder;
my $html =
q{<tr><td>1</td><td>2</td><td><font>3</font></td><td>4</td><tr>};
my $tree = HTML::TreeBuilder->new_from_content($html);
my $td = $tree->look_down( '_tag',
'td',
sub { $_[0]->as_text =~ m/\b3/ });
$td->replace_with_content();
$tree->delete;
hth, PooLpi
| [reply] [d/l] |
|
It looks like XPath brother for html) thank you, I think I'll use this solution when I'll work directly with HTML code. For the solution please see the question of the topic - I added solution there.
| [reply] |
Re: regex: something...(!something)...something
by eosbuddy (Scribe) on Jul 17, 2008 at 18:36 UTC
|
Hi, perhaps I haven't understood your question (and hence this solution may be wrong)... this relates to the greedy nature of quantifiers. Please review the code below and let me know in the same syntax:
$x = "<tr><td>1</td><td>2</td><td><font>3</font></td><td>4</td><tr>";
print "$x\n";
$x =~ s/<td>.*(<td>.*?)3(.*?<\/td>).*<\/td/<td>$1a$2<\/td>/;
print "$x\n";
gives me:
<tr><td>1</td><td>2</td><td><font>3</font></td><td>4</td><tr>
<tr><td><td><font>a</font></td></td><tr>
is this your desired output? | [reply] [d/l] [select] |
|
Sorry, but no, this is not the desired result.. My mistake - I hadn't told the desired result. If you are interested in the solution, please see Olus's posts. Nevertheless, thanks for help!
| [reply] |
|
Hi,
Sorry about that, nonetheless, this code also will do the trick you want :-)
$x =~ s/<td>.*<td>(.*?)3(.*?)<\/td>.*<\/td>/<td>$13$2<\/td>/;
| [reply] [d/l] |
Re: regex: something...(!something)...something
by toolic (Bishop) on Jul 17, 2008 at 17:28 UTC
|
there must NOT be some text or regex snippet inside the block
Can you elaborate on that?
Otherwise, there is plenty to read on the topic at: perlretut, perlrequick, perlfaq6, perlre, etc.
| [reply] |
|
I've read that.. I understand I'm not a perl master, but if you know the solution, please, tell me where I'm not right..
$x = "<tr><td>1</td><td>2</td><td><font>3</font></td><td>4</td><tr>";
$x =~ s{<td>(?!<td>)3(?!</td>)</td>}{<td>($1)a($2)</td>};
print $x;
It does not work... are there any ideas?
Update
deep posts does not appear for some reasons.. I'll post again in the top-level:
you see, I met such problem not with html only... this is a general question. I'm pretty sure there must exist some regex to solve the problem, but my knowledge is too small. But, nevertheless, thanks)
Maybe, someone other knows any regex solutions? | [reply] [d/l] |
|
are there any ideas?
Yes, consider abandoning a regex approach, and select an appropriate CPAN solution for parsing HTML. I have used
HTML::TokeParser, although I am not experienced enough with it to know if it will solve your problem.
| [reply] |
|
|
deep posts does not appear for some reasons.. I'll post again in the top-level:
I personally believe you should go to your User Settings and avoid to "post again in the top-level" - that is not going to earn you anything. It's a matter of visualization anyway. Personally, I've set both Replies header depth and Replies text depth to 1000: but IIRC, with lower values you still get a pointer to deeper posts.
| [reply] [d/l] |