http://www.perlmonks.org?node_id=230526

Wassercrats has asked for the wisdom of the Perl Monks concerning the following question:

Read the comments. Don't ask me to explain what the code does. I know what it used to do before I chiseled away at it in a failed attempt to isolate the bug (in Perl, I assume), and that is proprietary information. The only thing I know is that the output makes sense if you change $Build_Text .= $New_Line.'<br>';(near the bottom) to $Build_Text .= $New_Line;
#!/usr/bin/perl -w ###################################################################### +######## print "Content-type: text/html\n\n"; $Count_Link = 14; @line = ('<a href = "http://polisource.com">12(2345678901234)</a>'); for($Link_Count=0;$Link_Count<scalar(@line);$Link_Count++) { $line[$Link_Count] =~ /href.+?>(.*?)</; $Text = $1; if(length($Text) <= $Count_Link) { $line[$Link_Count] =~s /(href.+?>)(.*?<)/$1$Build_Text$2/; $Build_Text = ""; next; } $Width2 = $Count_Link-1; $Order = substr($Text,0,$Count_Link); if($Order !~ /\(/) { $Build_Text .= $Order . '<br>'; $line[$Link_Count] =~s /(href.+?>)\Q$Order\E/$1/; $Link_Count--; if (substr ($Text,$Count_Link,1) ne ' ') { if (!$Links_Count) { chomp($ORIGINAL_DOC = "$OriginalLinks[$OL_count]"); $Links_Count .= " $ORIGINAL_DOC:\n"; } $Links_Count .= "$Order"; next; } } for($Break_Count = $Width2;$Break_Count>-1;$Break_Count--) { $character = substr($Order,$Break_Count,1); $c2 = ')'; if ($character eq '[') { $c2 = ']'; } ################################### ### "test 1" gets printed. if (!$Break_Count){ print "<br>test 1"; } ################################### if ((!$Break_Count)&& ($Build_Text =~ /.*?(?!<br>.*<br>)(.*)(<br>)/)&& (length $1 < $Count_Link)&& ($line[$Link_Count] =~ /href.+?>$Order($c2|.$c2)/)) { ###################################################### ### If the above condition were true, "test 2" would be printed. Since + it's not printed, this block is skipped. if (!$Break_Count){ print "<br>test 2"; } ###################################################### $New_Line = substr($Order,1); $line[$Link_Count] =~s /(href.+?>).{$Count_Link}(.)/$1/ +; $Build_Text =~s /<br>$/$character<br>$New_Line$2<br>/; $Link_Count--; last; } elsif (!$Break_Count) { ################################### ### Nothing changed since "test 1" ### was printed, so why isn't ### "test 3" printed? if (!$Break_Count){ print "<br>test 3"; } ################################### $Build_Text .= $Order . '<br>'; $line[$Link_Count] =~s /(href.+?>)\Q$Order\E/$1/; $Link_Count--; last; } elsif ($character eq '(') { $New_Line = substr($Order,0,$Break_Count); $Build_Text .= $New_Line.'<br>'; $line[$Link_Count] =~s /(href.+?>)\Q$New_Line\E/$1/is; $Link_Count--; last; } } }

update (broquaint): title change (was Why?)

Replies are listed 'Best First'.
Re: Strange problems with CGI script
by BrowserUk (Patriarch) on Jan 28, 2003 at 08:58 UTC

    Look closely at what you have in variable $c2!

    Unmatched ) before HERE mark in regex m/href.+?>(2345678901234()|.)) < +< HERE / at C:\test\junk.pl line 55.

    Examine what is said, not who speaks.

    The 7th Rule of perl club is -- pearl clubs are easily damaged. Use a diamond club instead.

      I've had several programmers trying to figure this out, and nobody found that! How does one get to see that error? It wasn't in my error log. Thanks!
        Due to various issues people have with error logs, you shoud really debug from the commandline, especially for compile time errors.

        You should also familiarize yourself with CGI::Carp (great for run time errors). I'm quite fond of keeping my own log, something resembling

        BEGIN { use CGI::Carp qw( fatalsToBrowser ); CGI::Carp::carpout(\*LOGGY) if open(LOGGY, '>>'.__FILE__.'.err.log'); }


        MJD says you can't just make shit up and expect the computer to know what you mean, retardo!
        ** The Third rule of perl club is a statement of fact: pod is sexy.

Re: Strange problems with CGI script
by PodMaster (Abbot) on Jan 28, 2003 at 09:29 UTC
    Don't ask me to explain what the code does.

    What is it supposed to do?

    What makes a bad question? Before You Post ... How (Not) To Ask A Question How to ask questions the smart way

    It's a bad idea to try to parse HTML using regular expressions (especially like that). There are some pure perl html parsers available if you're really in a bind, like YAPE::HTML by japhy, a real perl regex hacker.

    Anyway, my own HTML::LinkExtractor seems perfect for what you're trying to do, which I'm guessing is something like:

    #!perl #!/usr/bin/perl use CGI qw[:standard]; use HTML::LinkExtractor; use strict; use warnings; print header,start_html; { my $count=0; my $lE = HTML::LinkExtractor->new( sub { my( $lE, $t ) = @_; if( $t->{tag} eq 'a' ) { my ( $outer, $inner ) = $t->{_TEXT} =~ /(\d+)\((\d+)\) +/; return unless $outer and $inner; print a( { -href => $t->{href} }, $count++ ), qq{ [ $outer ][ $inner ]}, br; } } ); $lE->strip(1); $lE->parse(\q[ <a href = "http://polisource.com">12(2345678901234)</a> ]); } print end_html;


    MJD says you can't just make shit up and expect the computer to know what you mean, retardo!
    ** The Third rule of perl club is a statement of fact: pod is sexy.

      Extracting the links was the easy part, although maybe I wansn't perfect. The harder part was manipulating them the way I wanted, which seemed like a custom job, so I didn't research the various modules. The REAL problem was having Perl stop without giving me an error. Somehow, other experienced programmers missed it too. I'm not sure how. Sorry about asking incorrectly.