http://www.perlmonks.org?node_id=941732

sdyates has asked for the wisdom of the Perl Monks concerning the following question:

Guys, Been up most of the night. Cannot get to output to file properly. Think my Mac is find things up... I can parse the company, device, date, interface, autoconfig stuff properly. However, I cannot parse the list properly. I can output to display ok, but file writes are messed up. Below is the perl file. If you need to see the four html files, I can email them. they are supposed to be in the folder wwwindex Thanks, S perl file:

#=============================== # Adding a link record to the link database # Emmie P. Lewis, Perl/PHP Guide - About.com # http://perl.about.com # Created 04/30/01 # Last modified 04/30/11 #============================================== #use strict; #use DBI; use open qw( :encoding(UTF-8) :std ); #my $dbh; # the database handle #my $SQLCommand = "select GamePic1, GameTitle, GameDescription, GameDe +vice from gameinventory where GameDevice like 'zx spectrum' order by +GameTitle;"; my $output = "hardwaredatabase.txt"; my $directory = "wwwindex"; #directory where index.html files are loca +ted my $device; my $deviceFileName; my $company; my $date; my $amiga; my $interface; my $autoconfig; my @TITLE; my $title; my $point; #Start Logic DirectoryFileSearch(); #Insert html into file DumpDB2File(); #==================== [ DumpDB2File ] ==================== sub DumpDB2File { } #==================== [ open_file ] ==================== sub OpenFile { open (OUTPUT, ">>$output") or die "Cannot open $output"; } #>-----------------------[ OpenDirectory ]------------------------< sub DirectoryFileSearch { if (substr($directory, -1,1) ne "\\") { $directory = "$directory" . "\\"; } # Instead of using open, opendir takes all files with in a # directory and places them inside an array: @FileInDirectory opendir(DIR, $directory) or die "Cannot open $directory"; @FileInDirectory = readdir DIR; close(DIR); # removes '.' and '..' from the array. For file systems without # these objects, comment out the following line. splice(@FileInDirectory,0,2); &Engine; } sub Engine { foreach $FileInDirectory(@FileInDirectory) { $FilePath = "$directory" . "$FileInDirectory[$z]"; print "\n\nFile path: $FilePath \n"; open (SINGLE, $FilePath) or die "Cannot open $FilePath"; while (<SINGLE>) { $SFS = $_; &ExtractText; close OUTPUT; # if ($SFS =~ m/$string/) { # open (OUTPUT, ">>$output") or die "Cannot open $outpu +t"; # print OUTPUT "$FileInDirectory[$z]" . " - " . "$SFS"; # close OUTPUT; # } $word =0; $line =0; $char = 0; } close single; $z = $z +1; } } sub ExtractText { # open (OUTPUT, ">>$output") or die "Cannot open $output"; if ( $SFS =~ m:<big>(.*?)</big>: ) { # looks for <big>< +/big and capturew text inetween $device = $1; $device =~ s/<[^>]*>//g; # removes all char +acters between <> including <> $device =~ tr/""//d; # removes double q +uotes print $device; $DeviceFileName = $device; $DeviceFileName =~ s/[^a-zA-Z0-9]//g; #removes all spaces +& special characters print $DeviceFileName; } if ( $SFS =~ m:</font>(.*?)</small>: ) { # looks for <b +ig></big and capturew text inetween $company = $1; $company =~ s/<[^>]*>//g; # removes all cha +racters between <> including <> $company =~ tr/""//d; # removes double +quotes } if ( $SFS =~ m:Date(.*?)</small>: ) { # looks for <big> +</big and capturew text inetween $date = $1; $date =~ s/<[^>]*>//g; # removes all charac +ters between <> including <> $date =~ tr/""//d; # removes double quo +tes } if ( $SFS =~ m:Amiga(.*?)</small>: ) { # looks for <big +></big and capturew text inetween $amiga = $1; $amiga =~ s/<[^>]*>//g; # removes all chara +cters between <> including <> $amiga =~ tr/""//d; # removes double qu +otes } if ( $SFS =~ m:Interface(.*?)</small>: ) { # looks for +<big></big and capturew text inetween $interface = $1; $interface =~ s/<[^>]*>//g; # removes all c +haracters between <> including <> $interface =~ tr/""//d; # removes doubl +e quotes } if ( $SFS =~ m:Autoconfig(.*?)</small>: ) { # looks for + <big></big and capturew text inetween $autoconfig = $1; $autoconfig =~ s/<[^>]*>//g; # removes all +characters between <> including <> $autoconfig =~ tr/""//d; # removes doub +le quotes } while ($SFS =~ m:<UL>(.*?)</I>:g) { $title = $1; $title = $title . "\n"; print "\n". $1; $title =~ s/<[^>]*>//g; # removes all chara +cters between <> including <> $title =~ s/\b(\w)/\u$1/g; chomp $title; $format = "<UL><I>".$title."</I>\n"; $title = $format; push (@Tarray, $title); # $title =~ tr/""//d; # removes double q +uotes # } } while ( $SFS =~ m:<LI>(.*?)$:g ) { # looks for <big></b +ig and capturew text inetween $point = $1; $point = $point . "\n"; print "\n". $1; $point =~ s/<[^>]*>//g; # removes all chara +cters between <> including <> $point =~ s/\b(\w)/\u$1/g; # Capitalises first +word $format = "<LI>".$point."\n"; $point = $format; chomp $point; push (@Tarray, $point); # $point =~ tr/""//d; # removes double q +uotes } &MakeHTML; open (OUTPUT, ">$DeviceFileName.php") or die "Cannot open +$output"; print OUTPUT $HTML1.$HTML2; print OUTPUT @Tarray; print OUTPUT @Tpoint; print OUTPUT "</UL>"; } sub MakeHTML { $HTML1='<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" +> <html> <head> <title>AT-Bus 508 :: RetroGameAndComputer.com</title> <link rel="Shortcut Icon" href="favicon.ico"> <meta property="fb:admins" content="654435532" /> <META name="Abstract" content="Retro Amiga Storage Peripherals '. $dev +ice .'"> <META name="Description" content="The '. $device .' with specification +s, interface slot and front and back images."> <META name="KeyWords" content="'. $device .', '. $company .', Amiga St +orage Peripherals, expansion cards"> <META name="Copyright" content="Copyright &copy; 2011 | RetroGameAndCo +mputer.com || All RightsReserved"> <META http-equiv="content-language" content="en"> <META http-equiv="Pragma" content="no-cache"> <META http-equiv="Content-Type" content="text/html; charset=iso-8859-1 +"> <META name="Robots" content="index,follow"> <META name="Robots" content="noarchive"> <META name="Googlebot" content="noarchive"> <META name="MSNbot" content="noarchive"> <META name="Security" content="Public"> <META name="Distribution" content="Global"> <link rel="StyleSheet" href="css/common-style.css" type="text/css"> <link rel="StyleSheet" href="css/gray.css" type="text/css"> <link rel="StyleSheet" href="css/gray-menu.css" type="text/css"> <script language="JavaScript" type="text/javascript" src="css/css.js"> +</script> <script language="JavaScript" type="text/javascript" src="javascripts. +js"></script> <script language="JavaScript" type="text/javascript" src="pop-closeup. +js"></script> <script src="Scripts/AC_RunActiveContent.js" type="text/javascript"></ +script> <!-- Start Google Analytics --> <script type="text/javascript"> var _gaq = _gaq || []; _gaq.push([\'_setAccount\', \'UA-21373706-3\']); _gaq.push([\'_trackPageview\']); (function() { var ga = document.createElement(\'script\'); ga.type = \'text/java +script\'; ga.async = true; ga.src = (\'https:\' == document.location.protocol ? \'https://ssl +\' : \'http://www\') + \'.google-analytics.com/ga.js\'; var s = document.getElementsByTagName(\'script\')[0]; s.parentNode +.insertBefore(ga, s); })(); </script> <!-- End Google Analytics --> </head> <BODY BGCOLOR="#FFFFFF" TEXT="#000000"> <!-- Start Outer Page Table --> <table bgcolor="#FFFFFF" align="center" cellpadding="0" cellspacing="0 +" border="1" width="994"> <tr> <td align="center" valign="top"><script language="JavaScript" type +="text/javascript" src="header.js"></script> <!-- Start Picture Bar Table --> <table cellpadding="0" cellspacing="0" border="0" width="100%"> <tr> <td background="picts/home.jpg" class="pictbackground"><img +src="img/headers/retrogameandcomputer.jpg" border="0" width="100%" he +ight="80" alt="Image"><br></td> </tr> <tr class="printhide"> <td class="pagebars"><img src="picts/spacer.gif" width="10" +height="1" alt="image"><br></td> </tr> </table> <!-- End Picture Bar Table --> <script language="JavaScript" type="text/javascript" src="menu.j +s"></script> <!-- Start Split Table --> <table cellpadding="0" cellspacing="0" border="0" width="100%"> <tr> <td align="left" valign="top" class="sidebar-background"><if +rame name="Sidebar" src="sidebar.htm" width="187" height="600" frameb +order="0" marginheight="0" marginwidth="0" scrolling="no" class="side +bar-frame"> </iframe> <!-- Start Left Sidebar Content Table --> <TABLE cellpadding="10" cellspacing="0" border="0" width=" +187" class="sidebartext"> <tr> <td align="left" valign="top"> <?php include("inc/google.search.inc.php"); ?> <br> <!-- Start Facebook iFrame--> <iframe src="http://www.facebook.com/plugins/like.php?href=http%3A%2F% +2Fretrogameandcomputer.com&amp;layout=button_count&amp;show_faces=tru +e&amp;width=450&amp;action=like&amp;font&amp;colorscheme=light&amp;he +ight=21" scrolling="no" frameborder="0" style="border:none; overflow: +hidden; width:167px; height:21px;" allowTransparency="true"></iframe> <!-- End Facebook iFrame--> <br><br><br> <!-- Place this tag in your head or just before your close body tag -- +> <!-- Place this tag where you want the +1 button to render --> <script type="text/javascript" src="https://apis.google.com/js/plusone +.js"></script> <g:plusone></g:plusone><br> <br> <p class="sidebartext" align="justify">Retro Game An +d Computer covers retro game consoles, home computers, classic video +games and hardware peripherals.</p> <p class="sidebartext" align="justify">We cover all +computers systems and game forms from the 1970s including Atari, Comm +odore, ZX Spectrum, Sega, Sony, Nintendo and more.</p> <br><div align="center"><a class="sidelink" href="li +nk-to-us.php"><img border="0" src="img/linktous.png" alt="Link To Us" +></a></div> <br><p class="sidebartitle">Advertise With Us</p> <p class="sidebartext" align="justify">Find out how +to advertise with RetroGameAndComputer.com. It\'s quick and easy and +we we have advertising solutions for any budget.</p> <p class="sidebartext">:: <a class="sidelink" href=" +advertising.php">Advertise With Us</a></p> <br> <br> <br> <script type="text/javascript"><!-- google_ad_client = "ca-pub-1786059499637793"; /* wide game site */ google_ad_slot = "1105492661"; google_ad_width = 160; google_ad_height = 600; //--> </script> <script type="text/javascript" src="http://pagead2.googlesyndication.com/pagead/show_ads.js"> </script> <p class="sidebartitle">Reciprocal Links</p> <p class="sidebartext">We encourage a reciprocal lin +k policy. If you have a web site that relates to bass fishing please +feel free to link to us and let us know. We\'ll link back in return. +Visit our <a class="sidelink" href="link-to-us.php">Link To Us</a> pa +ge for details on how to link to us.</p> <p class="sidebartext">:: <a class="sidelink" href=" +link-to-us.php">Link To Us</a></p></td> </tr> </table> <!-- End Left Sidebar Content Table --></td> <td align="left" valign="top" width="40" class="pageheight"> +<div id="cornerimage"> <img src="picts/spacer.gif" height="40" width= +"40" alt="image"><br> </div> <img src="picts/spacer.gif" height="200" width="10" border +="0" alt="image"><br></td> <td align="center" valign="top" class="shadow-horizontal"><b +r> <br> <!-- End Left Sidebar Content Table --> '; $HTML2 ='<!---------------------------------------------> <!-- Start Main Secrtion --> <!---------------------------------------------> <table cellpadding=\"0\" cellspacing=\"0\" border=\"0\" wi +dth=\"100%\"> <tr> <td align=\"left\" valign=\"top\" class=\"just\"><!-- +Start Main Content Table --> <p class=\"smalltext\"><a href=\"index.php\">HOME</a +>&gt;&gt; <a href=\"retro-peripherals.php\">Peripherals</a> &gt;&gt; +<a href=\"amiga-peripherals.php\">Amiga</a> &gt;&gt; <a href=\"$devic +eFileName\">Storage</a> &gt;&gt; </a> '. $device .'</p> <h1>Amiga Storage Peripherals</h1> <hr class=\"page-splits\"> <!-- Start Main Column Content - Single Column --> <table width=\"100%\" border=\"0\" cellpadding=\"7\" +> <tr> <td> <h2><strong>'. $device .'</strong></h2> <p align=\"justify\">> Company: '. $company .'<br> > Date: '. $date .'<br> > Amiga: '. $amiga .'<br> > Interface: '. $interface .'<br> > Autoconfig ID: '. $autoconfig .'<br> </p> <br> <br> '; } #>-----------------------[ CleanDescription ]------------------------< sub CleanDescription { $desc = $LINE[2]; $desc =~ s/\r|\n//g; $desc1 = substr($desc,0,480); $description = $desc1; } sub CreateFilename { $name1 = $LINE[1]; $name1 =~ s/\s//g; # reoves spaces $name1 =~ s/[^\w]//g; # reoves non alphanumerica $name = substr($name1,0,16); $name = $name . "-$cons.jpg"; } sub FileName { $filename = $LINE[0]; $filename =~ s/\.[^.]+$//; $filename = $filename . ".jpg"; }

Here is one of the html files: index.html:

<html> <head> <meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1 +" > <title>Amiga Hardware Database - Nicolas Welte 2MB RAM for A570</title +> <meta name='keywords' content='amiga,hardware,Nicolas Welte (N.Welte) +2MB RAM for A570'> <meta name='description' content='Amiga Hardware Database - Nicolas We +lte 2MB RAM for A570'> <link rel="stylesheet" href="../../lexsite.css" type="text/css"> <script type="text/javascript" language="javascript">if (top != self) +top.location.href = location.href;</script> <script src="http://www.google-analytics.com/urchin.js" type="text/jav +ascript"></script> <script type="text/javascript"> _uacct = "UA-137763-1"; urchinTracker(); </script> <script type="text/javascript"> var gaJsHost = (("https:" == document.location.protocol) ? "https://ss +l." : "http://www."); document.write(unescape("%3Cscript src='" + gaJsHost + "google-analyti +cs.com/ga.js' type='text/javascript'%3E%3C/script%3E")); </script> <script type="text/javascript"> try { var pageTracker = _gat._getTracker("UA-3971897-1"); pageTracker._trackPageview(); } catch(err) {}</script> </head> <body bgcolor="#0c0e64" background="../../images/pat-amiga.gif" text=" +#ffffff" link="#c7d3ef" vlink="#9eb8da" alink="#f2cac5"> <center> <table width=592> <tr><td> <map name=main> <area shape=rect coords="27,0,111,60" href="../../index.html#t +opics" title="Home"> </map> <center> <table width=565 height=61 cellspacing=0 cellpadding=0 border=0> <tr> <td><img src="../../images/head-ls.gif" width=307 height=61 bo +rder=0 alt="Amiga Hardware Database" usemap="#main"></td> <td width=258><img src="../../images/head-ta.gif" width=258 he +ight=31 alt=""><a href="../../expansion.html"><img src="../../images/ +head-ex.gif" width=258 height=30 border=0 alt="Expansion cards" title +="List of expansion categories"></a></td> </tr> </table> </center> <br><br> <br><table width=592 cellspacing=0 cellpadding=0 border=0><tr valign=t +op><td width=524> <b><big><a name='2mb570'>2MB RAM for A570</a></big></b></td> <td><img src='../../images/link-ul.gif' width=50 height=16></td> <td><a href='#t_2mb570'><img src='../../images/link-photo.gif' width=3 +8 height=16 alt='Photo' border=0></a></td> <td><img src='../../images/link-ur.gif' width=55 height=16></td> <tr><td colspan=4><table><tr valign=top> <td><small><font color='#c88c96'>Company</font><br>Nicolas Welte, Germ +any</small></td><td>&nbsp;</td><td><small><font color='#c88c96'>Date< +/font><br>2003</small></td><td>&nbsp;</td><td nowrap><small><font col +or='#c88c96'>Amiga</font><br>A500<br></small></td><td><small>&nbsp;</ +small></td><td nowrap><small><font color='#c88c96'>Interface</font><b +r>A570<br></small></td><td>&nbsp;</td></table></td></table> <UL> <LI>2 MB Fast RAM expansion <LI>connects to the 40 pin expansion header of the <A HREF="../a570/ +index.html">Commodore A570</A> <LI>four 1M&times;4 chips soldered on board <LI>disable jumper <LI>nothing else is required to be on the board as the multiplexing +circuit is already in the A570 </UL> <p></p><ul> <table border="0" cellpadding="3" cellspacing="0"><tbody> <tr><td><small class="sw2"><a name=t_2mb570>Photo</a></small><hr></td> +</tr> <tr><td> <table width="100%"> <tr colspan=2 valign="top"><td class="sw2" align="center" valign="top" +> <a href='../../photos/2mb570,1/index.html' width="70" height="54" +target=_top border=0><img src='../../photos/thumbnails/2mb570.png' al +t="Nicolas Welte 2MB RAM for A570 - front side" title="Nicolas Welte + 2MB RAM for A570 - front side"></a><br> <small class="sw2"> front side</small><br><br> </td><td class="sw2" align="center" valign="top"> </td></tr> </table> </td></tr> </tbody></table></ul> <br><br><br> <center> <table cellspacing=0 cellpadding=0 border=0> <tr> <td><a href="../../about.pl" title="About the Amiga Hardware Datab +ase"><img src="../../images/foot-ab.gif" width=136 height=27 border=0 + alt="About"></a></td> <td><a href="../search.pl" title="Advanced search"><img src="../.. +/images/foot-se.gif" width=172 height=27 border=0 alt="Search"></a></ +td> <td><a href="../../links.html" title="Companies, dealers, related +sites"><img src="../../images/foot-li.gif" width=128 height=27 border +=0 alt="Links"></a></td> </tr> </table> </center> </td> </tr> </table> </center> </body> </html> <!-- Localized -->

And one file that has a more complex list:

<html> <head> <meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1 +" > <title>Amiga Hardware Database - Computer System Associates Twelve Gau +ge (Derringer 1250)</title> <meta name='keywords' content='amiga,hardware,Computer System Associat +es (CSA) Twelve Gauge (Derringer 1250)'> <meta name='description' content='Amiga Hardware Database - Computer S +ystem Associates Twelve Gauge (Derringer 1250)'> <link rel="stylesheet" href="../../lexsite.css" type="text/css"> <script type="text/javascript" language="javascript">if (top != self) +top.location.href = location.href;</script> <script src="http://www.google-analytics.com/urchin.js" type="text/jav +ascript"></script> <script type="text/javascript"> _uacct = "UA-137763-1"; urchinTracker(); </script> <script type="text/javascript"> var gaJsHost = (("https:" == document.location.protocol) ? "https://ss +l." : "http://www."); document.write(unescape("%3Cscript src='" + gaJsHost + "google-analyti +cs.com/ga.js' type='text/javascript'%3E%3C/script%3E")); </script> <script type="text/javascript"> try { var pageTracker = _gat._getTracker("UA-3971897-1"); pageTracker._trackPageview(); } catch(err) {}</script> </head> <body bgcolor="#0c0e64" background="../../images/pat-amiga.gif" text=" +#ffffff" link="#c7d3ef" vlink="#9eb8da" alink="#f2cac5"> <center> <table width=592> <tr><td> <map name=main> <area shape=rect coords="27,0,111,60" href="../../index.html#t +opics" title="Home"> </map> <center> <table width=565 height=61 cellspacing=0 cellpadding=0 border=0> <tr> <td><img src="../../images/head-ls.gif" width=307 height=61 bo +rder=0 alt="Amiga Hardware Database" usemap="#main"></td> <td width=258><img src="../../images/head-ta.gif" width=258 he +ight=31 alt=""><a href="../../expansion.html"><img src="../../images/ +head-ex.gif" width=258 height=30 border=0 alt="Expansion cards" title +="List of expansion categories"></a></td> </tr> </table> </center> <br><br> <br><table width=592 cellspacing=0 cellpadding=0 border=0><tr valign=t +op><td width=524 rowspan=2> <b><big><a name='12gauge'>Twelve Gauge (Derringer 1250)</a></big></b>< +/td> <td><img src='../../images/link-ul.gif' width=50 height=16></td> <td><a href='#t_12gauge'><img src='../../images/link-photo.gif' width= +38 height=16 alt='Photo' border=0></a></td> <td><a href='#s_12gauge'><img src='../../images/link-soft.gif' width=5 +5 height=16 alt='Software' border=0></a></td> <tr><td colspan=3 align=center> <img src=../../images/link-bl.gif width=68 height=14><a href='../../pe +rf/aibb-ref=12gauge.pl#12gauge' target=aibb><img src='../../images/li +nk-ai.gif' width=42 height=14 alt='AIBB' border=0></a> <tr><td colspan=4><table><tr valign=top> <td><small><font color='#c88c96'>Company</font><br>Computer System Ass +ociates, USA</small></td><td>&nbsp;</td><td><small><font color='#c88 +c96'>Date</font><br>1993</small></td><td>&nbsp;</td><td nowrap><small +><font color='#c88c96'>Amiga</font><br>A1200<br></small></td><td><sma +ll>&nbsp;</small></td><td nowrap><small><font color='#c88c96'>Interfa +ce</font><br>trapdoor slot<br></small></td><td>&nbsp;</td><td nowrap> +<small><font color='#c88c96'>Autoconfig ID</font><br>1058 / 21<br></s +mall></td> </table></td></table> <UL><I>processor</I> <LI>68EC030 @ 40 MHz or 68030 @ 33 / 50 MHz, PGA <LI>optional 68882 PGA FPU </UL> <UL><I>memory</I> <LI>one 72 pin SIMM socket accepts 32 MB RAM <LI>supports 4, 8, 16, 32 MB SIMMs, 60-70 ns <LI>burst RAM access </UL> <UL><I>optional modules</I> <LI>SCSI controller <UL> <LI>NCR 53C80 controller IC <LI>does not use DMA transfer <LI>autoboot ROM (csascsi.device) <LI>DB25 external SCSI connector <LI>supported by NetBSD and OpenBSD </UL> <LI>networking controller </UL> <p></p><ul> <table border="0" cellpadding="3" cellspacing="0"><tbody> <tr><td><small class="sw2"><a name=t_12gauge>Photo</a></small><hr></td +></tr> <tr><td> <table width="100%"> <tr colspan=2 valign="top"><td class="sw2" align="center" valign="top" +> <a href='../../photos/12gauge,1/index.html' width="155" height="10 +1" target=_top border=0><img src='../../photos/thumbnails/12gauge.png +' alt="Computer System Associates Twelve Gauge (Derringer 1250) - fr +ont side" title="Computer System Associates Twelve Gauge (Derringer 1 +250) - front side"></a><br> <small class="sw2"> front side</small><br><br> </td><td class="sw2" align="center" valign="top"> <a href='../../photos/12gauge,2/index.html' width="154" height="10 +3" target=_top border=0><img src='../../photos/thumbnails/12gauge-bac +k.png' alt="Computer System Associates Twelve Gauge (Derringer 1250) +- back side" title="Computer System Associates Twelve Gauge (Derring +er 1250) - back side"></a><br> <small class="sw2"> back side</small><br><br> </td></tr> </table> </td></tr> </tbody></table></ul> <p><ul> <table cellspacing=0 cellpadding=3 border=0> <tr><td colspan=2><small class=sw2><a name=s_12gauge>Software</a></sma +ll><hr></td> <tr> <td class=sw1><a href=../../install/Twelve_Gauge_U62.bin>Twelve_Gauge_ +U62.bin</a> <font size=-1 color='#ffbeb4' class=comment>(32 kB)</font> <br>firmware v1.0<br>22162A / R1.0 / 0874</td> <td>&nbsp; <small class=sw2></small></td> </table> </ul> <br><br><br> <center> <table cellspacing=0 cellpadding=0 border=0> <tr> <td><a href="../../about.pl" title="About the Amiga Hardware Datab +ase"><img src="../../images/foot-ab.gif" width=136 height=27 border=0 + alt="About"></a></td> <td><a href="../search.pl" title="Advanced search"><img src="../.. +/images/foot-se.gif" width=172 height=27 border=0 alt="Search"></a></ +td> <td><a href="../../links.html" title="Companies, dealers, related +sites"><img src="../../images/foot-li.gif" width=128 height=27 border +=0 alt="Links"></a></td> </tr> </table> </center> </td> </tr> </table> </center> </body> </html> <!-- Localized -->

Replies are listed 'Best First'.
Re: Can't Write to Files properly
by Not_a_Number (Prior) on Dec 04, 2011 at 18:51 UTC

    Where to start?

    Well, the first red flag is that you comment out use strict; towards the top of your script...

    Scrolling down to the bottom, your last three subs try to use a non-existent global @LINE array, a problem which of course strictures would have identified.

    My advice: go to bed, get a good night's sleep (and try not to dream of Perl...) :)

    And tomorrow, after uncommenting use strict; and adding use warnings;, work through your code until you've fixed all the errors thereby revealed.

    Then, if it still doesn't work, remove everything from your test code that has no relevance to the problem at hand (particularly, but not exclusively, the 170+ lines in sub MakeHTML), after which we might be willing to debug any remaining errors.

    A reply falls below the community's threshold of quality. You may see it by logging in.
    A reply falls below the community's threshold of quality. You may see it by logging in.
Re: Can't Write to Files properly
by CountZero (Bishop) on Dec 04, 2011 at 20:48 UTC
    Here is a small program that extracts all the list-items from the HTML:
    use Modern::Perl; use HTML::PullParser; use Data::Dump qw/dump/; my $document_to_parse = 'html_1.html'; my $p = HTML::PullParser->new( file => $document_to_parse, start => '"S", tagname, text', end => '"E", tagname, text', text => '"T", text', ) or die "Error: $!"; my @data; my $text = ''; while ( my $token = $p->get_token ) { if ( ( $token->[0] eq 'S' and $token->[1] eq 'ul' ) .. ( $token->[0] eq 'E' and $token->[1] eq 'ul' ) ) { if ( $token->[0] eq 'T' ) { $token->[1] =~ s/(\s*)$//g; $text .= "$token->[1] " if $token->[1]; } if ( ( $token->[0] eq 'S' and $token->[1] eq 'li' ) or ( $token->[0] eq 'E' and $token->[1] eq 'ul' ) ) { $text =~ s/(\s*)$//g; push @data, $text if $text; $text = ''; } } } say dump(@data);
    It returns the following from the first HTML:
    2 MB Fast RAM expansion connects to the 40 pin expansion header of the Commodore A570 four 1M&times;4 chips soldered on board disable jumper nothing else is required to be on the board as the multiplexing circui +t is already in the A570 Photo front side
    And for the second HTML:
    processor 68EC030 @ 40 MHz or 68030 @ 33 / 50 MHz, PGA optional 68882 PGA FPU memory one 72 pin SIMM socket accepts 32 MB RAM supports 4, 8, 16, 32 MB SIMMs, 60-70 ns burst RAM access optional modules SCSI controller NCR 53C80 controller IC does not use DMA transfer autoboot ROM (csascsi.device) DB25 external SCSI connector supported by NetBSD and OpenBSD Photo front side back side Software Twelve_Gauge_U62.bin (32 kB) firmware v1.0 22162A / R1.0 / 08 +74 &nbsp;
    The matter got a bit complicated because the <LI> tags were not closed and inside some of the list items are other tags, so some re-assembly needed to be done.

    I have no idea how you want to handle the lists-inside-lists thing, so I just ignored this and made these into one flat array.

    CountZero

    A program should be light and agile, its subroutines connected like a string of pearls. The spirit and intent of the program should be retained throughout. There should be neither too little or too much, neither needless loops nor useless variables, neither lack of structure nor overwhelming rigidity." - The Tao of Programming, 4.1 - Geoffrey James

      I sure hope you collect your $20.

      HTH,

      planetscape
        I'll gladly exchange that $20 for some good karma.

        CountZero

        A program should be light and agile, its subroutines connected like a string of pearls. The spirit and intent of the program should be retained throughout. There should be neither too little or too much, neither needless loops nor useless variables, neither lack of structure nor overwhelming rigidity." - The Tao of Programming, 4.1 - Geoffrey James

      Thanks so much. This is perfect and give me what I need to figure it out. Thanks for being patient and taking the time. Regards, S
Re: Can't Write to Files properly
by CountZero (Bishop) on Dec 04, 2011 at 18:37 UTC
    However, I cannot parse the list properly. I can output to display ok, but file writes are messed up.
    That is not a very helpful description of your problem.

    Actually you have two problems and you should handle them one by one:

    1. Parsing the HTML file into a data-structure. To look into that problem, please define how your data-structure should look and then we can perhaps help you how to achieve that.
    2. Writing the data-structure to a file. I *guess* some kind of templating system would be a good solution here.

    CountZero

    A program should be light and agile, its subroutines connected like a string of pearls. The spirit and intent of the program should be retained throughout. There should be neither too little or too much, neither needless loops nor useless variables, neither lack of structure nor overwhelming rigidity." - The Tao of Programming, 4.1 - Geoffrey James

      Using the two HTML files will better explain the data structure. The only part I am having trouble with is getting the list parsed. If you run from cmd, you will see how the list should be processed, but the output files mess up the list. I believe the problem is somewhat related to OUTPUT and the array. I have no idea what to do other than start over from scratch.
Re: Can't Write to Files properly
by keszler (Priest) on Dec 04, 2011 at 19:39 UTC

    $HTML1 and $HTML2 are created and defined in the MakeHTML subroutine. Once that subroutine ends they're out of scope. ExtractText then prints variables with the same name that are undefined.

    See Scoping for more info.

    If all you're interested in is a quick bandage (over an arterial laceration...), add

    my $HTML1; my $HTML2;
    after my $point;. Then get some sleep, uncomment use strict;, and start on a real fix.

      I appreciate your help, but these steps are helping the problem, just helping to make things more proper. Here is a sample output:
      <p align=\"justify\">> Company: Commodore, USA<br> > Date: 1986<br> > Amiga: A2000, A3000, A4000<br> > Interface: Zorro II<br> > Autoconfig ID: ID514 / 10<br> </p> <br> <br> <UL><I>Processor</I> <LI> >68EC030 @ 40 MHz Or 68030 @ 33 / 50 MHz, PGA <LI> >Optional 68882 PGA FPU <UL><I>Memory</I> <LI> >One 72 Pin SIMM Socket Accepts 32 MB RAM <LI> >Supports 4, 8, 16, 32 MB SIMMs, 60-70 Ns <LI> >Burst RAM Access <UL><I>Optional Modules</I> <LI> >SCSI Controller <LI> >NCR 53C80 Controller IC <LI> >Does Not Use DMA Transfer <LI> >Autoboot ROM (Csascsi.Device) <LI> >DB25 External SCSI Connector <LI> >Supported By NetBSD And OpenBSD <LI> >Networking Controller <LI> >Combination Of The Catweasel Mk2 Floppy Controller And The Buddh +a Flash IDE Controller Built Into One Device <LI> >Features All Buddha Flash And Catweasel Z-II Mk2 Functions <LI> >Works With All A1200 Zorro Busboards <LI> >64 DIP Sockets Accept 2 MB RAM <LI> >Supports 0.5, 1 Or 2 MB Configurations <LI> >Accepts 256k&Times;1 DIPs Only </UL>
      The code that displays the company name, date, autoconfig all works down to the list. In this case, the A2052.php outputted file, only these three lines should appear in the list:
      <LI> >64 DIP Sockets Accept 2 MB RAM <LI> >Supports 0.5, 1 Or 2 MB Configurations <LI> >Accepts 256k&Times;1 DIPs Only