NetWallah nailed it (if indeed, we understand what you're trying to acomplish) because parsing html with homebrew regexen is simply too easy to screw up.
IOW, use an appropriate module, which having stood the test of at least some time (and the terrors of CPAN's testing process) is more apt to be reliable than the one-off the newbie invents.
However, because maybe you really meant something like this?
#!/usr/bin/perl
use strict;
use warnings;
# 888817
my @array;
my $file = "888817.txt";
open FH, '<', $file or die "Can't open $file: $!";
# while ( $file ) {
my @line = <FH>;
for my $line(@line) {
if ( $line =~ /^\n/ ) {
next;
} else {
(my $found) = $line =~ m/<.[^>]*>/g;
print "\$found: $found \n";
push @array, $found;
}
}
for( my $i=0; $i<@array; $i++){
print "The Element $i is $array[$i]\n";
}
for( my $j =0; $j<@array; $j++){
for( my $k=$j+1; $k<@array; $k++) {
if( $array[$j] eq $array[$k]) {
print "substring($array[$j],$array[$k])\n";
}
}
}
Where the data looked like this:
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http:/
+/www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
<meta name="DESCRIPTION" content="Abcdef Hose Co. #1 -- protecting the
+ Abcdef, New York" />
<link type="text/css" rel="stylesheet" href="NHC1.css" />
<link rel="shortcut icon" href="http://Abcdef.org/favicon.ico" />
<title>(ww)Abcdef Hose Company #2 - Home</title>
</head>
<body>
<div id="title">
<span style="color: #cc0000; background-color: black;">Address: </span
+> 26 New Avenue, Abcdef, NY <span style="color: #cc0000; backg
+round-color: black;">phone: </span>nnn.nnn.nnnn</div>
<address>
Abcdef Hose Co. #1<br />
</address>
<br />
<p style="color: black; background-color: transparent; line-height: 99
+%;">Do you have something that you would like to see on the website?
+If so, let us know (use the email link below) and we will try to inco
+rporate it.</p>
<p><a href="mailto:ww@Abcdef.org"><img src="gfx/box.gif" alt="contact
+webmaster" width="43" height="55" />Email webmaster</a></p>
</div> <!-- end div left sidebar (lsb) -->
<div id="main">
<div id="main_header" style="width:100%;">
<h1 style="text-align: center;">Abcdef Hose Company #1</h1>
<img src="gfx/hoseco2.jpg" alt="Shoulder Patch: Abcdef Hose Company #2
+" width="235" height="255" hspace="250" />
</div> <!-- end main_header -->
<div id="news">
<div style="text-align:left;" class="style1"><strong>Latest Hot Stuff.
+..</strong></div>
<p>Special Drill, 10am, Tuesday, 14 May: MA companies at Hoovertown Ma
+ll</p>
</div> <!-- end div news -->
</div> <!-- end div main -->
</body>
</html>
producing this output:
$found: <?xml version="1.0" encoding="UTF-8"?>
$found: <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
+ "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
$found: <html xmlns="http://www.w3.org/1999/xhtml">
$found: <head>
$found: <meta http-equiv="Content-Type" content="text/html; charset=UT
+F-8" />
$found: <meta name="DESCRIPTION" content="Abcdef Hose Co. #1 -- protec
+ting the Abcdef, New York" />
$found: <link type="text/css" rel="stylesheet" href="NHC1.css" />
$found: <link rel="shortcut icon" href="http://Abcdef.org/favicon.ico"
+ />
$found: <title>
$found: </head>
$found: <body>
$found: <div id="title">
$found: <span style="color: #cc0000; background-color: black;">
$found: <address>
$found: <br />
$found: </address>
$found: <br />
$found: <p style="color: black; background-color: transparent; line-he
+ight: 99%;">
$found: <p>
$found: </div>
$found: <div id="main">
$found: <div id="main_header" style="width:100%;">
$found: <h1 style="text-align: center;">
$found: <img src="gfx/hoseco2.jpg" alt="Shoulder Patch: Abcdef Hose Co
+mpany #2" width="235" height="255" hspace="250" />
$found: </div>
$found: <div id="news">
$found: <div style="text-align:left;" class="style1">
$found: <p>
$found: </div>
$found: </div>
$found: </body>
$found: </html>
The Element 0 is <?xml version="1.0" encoding="UTF-8"?>
The Element 1 is <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transiti
+onal//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
The Element 2 is <html xmlns="http://www.w3.org/1999/xhtml">
The Element 3 is <head>
The Element 4 is <meta http-equiv="Content-Type" content="text/html; c
+harset=UTF-8" />
The Element 5 is <meta name="DESCRIPTION" content="Abcdef Hose Co. #1
+-- protecting the Abcdef, New York" />
The Element 6 is <link type="text/css" rel="stylesheet" href="NHC1.css
+" />
The Element 7 is <link rel="shortcut icon" href="http://Abcdef.org/fav
+icon.ico" />
The Element 8 is <title>
The Element 9 is </head>
The Element 10 is <body>
The Element 11 is <div id="title">
The Element 12 is <span style="color: #cc0000; background-color: black
+;">
The Element 13 is <address>
The Element 14 is <br />
The Element 15 is </address>
The Element 16 is <br />
The Element 17 is <p style="color: black; background-color: transparen
+t; line-height: 99%;">
The Element 18 is <p>
The Element 19 is </div>
The Element 20 is <div id="main">
The Element 21 is <div id="main_header" style="width:100%;">
The Element 22 is <h1 style="text-align: center;">
The Element 23 is <img src="gfx/hoseco2.jpg" alt="Shoulder Patch: Abcd
+ef Hose Company #2" width="235" height="255" hspace="250" />
The Element 24 is </div>
The Element 25 is <div id="news">
The Element 26 is <div style="text-align:left;" class="style1">
The Element 27 is <p>
The Element 28 is </div>
The Element 29 is </div>
The Element 30 is </body>
The Element 31 is </html>
substring(<br />,<br />)
substring(<p>,<p>)
substring(</div>,</div>)
substring(</div>,</div>)
substring(</div>,</div>)
substring(</div>,</div>)
substring(</div>,</div>)
substring(</div>,</div>)
....(in which, I still see no rhyme nor reason, but as they say: diff'rent strokes for diff'rent folks). |