eq does not work with regular expressions,
but only for a direct string match :
my $a = "foo";
my $b = "bar";
my $c = ".*";
print "eq" if ($a eq $a); # prints "eq"
print "ne" if ($a ne $b); # prints "ne"
print "ne" if ($a ne $a); # prints nothing
print "RE" if $a =~ /$c/; # prints "RE"
print "RE" if $a =~ /f.*/; # prints "RE"
What you maybe wanted
was something along these lines (tested :) ):
#!/usr/bin/perl -w
use strict;
my $filename = $ARGV[0] || "temp.html";
my $open;
undef $/; # undefine all line separators
open( FILE, $filename ) or die "Couldnīt open $filename : $!\n";
$open = <FILE>; # This slurps the whole file into one scalar (ins
+tead of an array) close FILE;
# I'll take a simplicistic approach that assumes that
# the only place where a ">" occurs is at the start of
# a tag. This does fail when you have for example :
# <IMG src="less.png" alt="a > b">
# which is valid HTML from what I know.
# I also ignore scripts and comment handling.
while ($open) {
# Match text followed by a tag into $1 and (if a tag follows exist
+s) $2:
$open =~ s/^([^<]+)?(<[^>]+>)?//;
print "Text : $1\n" if $1;
print "HTML: $2\n" if $2;
};
# the real meat of the code is the "s///;" line
# it works as follows :
# The two parenthesed parts capture stuff,
# the first parentheses capture non-tagged text
# the second parentheses capture text that is
# within "<" and ">"
# one or both of the parentheses are allowed to be empty
# Everything that is found is deleted from the start of
# the string.
# repeat as long as there is stuff in the slurped line
Of course, everything above could maybe be done
more correct by using
one of the HTML modules, like HTML::Parser - maybe
you want to take a look at these modules. takshaka has
mentioned a previous discussion of this topic where a working
example of usage of HTML::Parser was posted by him - a direct
link is here.
For more information about regular expressions
read the perlre manpage. |