http://www.perlmonks.org?node_id=480593

mojobozo has asked for the wisdom of the Perl Monks concerning the following question:

First, please forgive my absence. It's been almost 2 years since my last confession, er... post. As a result of this and not playing with perl in the mean time, I've forgotten a bit.

The question: I have this cgi script:
#!/usr/bin/perl print "Content-type: text/html\n\n"; &print_return_page_top; if (open (FILENAME, "..//..//test//index.html")) { $line = <FILENAME>; while ($line ne "") { print "$line"; $line = <FILENAME>; } } else { print "Booger<br>"; } &print_return_page_bottom; ################################## sub print_return_page_top { print <<RETURN_PAGE_TOP; <HTML> <HEAD> <TITLE>Update Index</TITLE> </HEAD> <BODY> Type your changes in the box below and then press Submit: <BR> <FORM NAME="update-index" ACTION="update-index.cgi" METHOD="get"> <TEXTAREA ROWS="20" COLS="100" NAME="index-data"> RETURN_PAGE_TOP } ################################### ################################## sub print_return_page_bottom { print <<RETURN_PAGE_BOTTOM; </TEXTAREA> <BR> <INPUT TYPE="Submit" NAME="Submit" VALUE="Submit"> </FORM> </BODY> </HTML> RETURN_PAGE_BOTTOM } ###################################


Works just as I want it, meaning it takes the html file I'm looking at and dumps it into the textarea. However, I want to grab just a portion of the file I'm reading from. I have <!-- Begin --> and <!-- End --> comments in the file and want to grab the stuff between them. I tried playing around a bit with the above script and these comment lines, but all I got was a blank textarea.

Can someone help me out?

Thanks!

_____________________________________________________
mojobozo
word (wûrd)
interj. Slang. Used to express approval or an affirmative response to
something. Sometimes used with up. Source

Replies are listed 'Best First'.
Re: file reading issues
by jdporter (Paladin) on Aug 03, 2005 at 18:47 UTC
    First of all, I would re-write
    $line = <FILENAME>; while ($line ne "") { print "$line"; $line = <FILENAME>; }
    as
    while (<FILENAME>) { print ; }
    if the expected behavior is to print out every line of the file.

    One way to tweak the above to get your desired output is as follows:

    while (<FILENAME>) { last if /<!-- Begin -->/; } while (<FILENAME>) { last if /<!-- End -->/; print ; }
      I would think that if there's text after <!-- BEGIN --> but on the same line, it would not be printed, as well as the text before <!-- End -->

      --------------------------------
      An idea is not responsible for the people who believe in it...

      OK, this one worked! Thanks, jdporter!

      _____________________________________________________
      mojobozo
      word (wûrd)
      interj. Slang. Used to express approval or an affirmative response to
      something. Sometimes used with up. Source
Re: file reading issues
by sgifford (Prior) on Aug 03, 2005 at 20:45 UTC
    The .. (dot-dot) operator was designed for this:
    while (<>) { if (/<!-- Begin -->/ .. /<!-- End -->/) { print; } }

    Or, more concisely:

    (/<!-- Begin -->/ .. /<!-- End -->/) && print while (<>);

    See Range Operators in perlref(1) for more information.

Re: file reading issues
by dtr (Scribe) on Aug 03, 2005 at 20:08 UTC

    You will get all manner of horrible things happen to your current code if the page that you're editing happens to contain the string "</textarea>" in it anywhere.

    You should escape the HTML that you are printing inside the text box to get around this. At a minimum, replacing all instances of "<" with "&lt;" should do the trick. There are modules such as HTML::Sanitizer which try to do this in a more sophisticated way.

Re: file reading issues
by kwaping (Priest) on Aug 03, 2005 at 18:45 UTC
    There are a couple ways I can think of to do that, offhand. If it's a small file and you can read it all into memory, you can do something like this:
    my $file = '/path/to/file.txt'; open(IN,"<$file") || die $!; read IN, my $html, -s $file; close(IN); $html =~ s/<!-- Begin -->(.*?)<!-- End -->/$1/s;

    Or, if the file is large and you'd like to read it line by line, you might want to try setting a flag when Begin is encountered, then turning it off again when End is hit.
    my $flag = 0; while (<FILE>) { $flag = 1 if (/<!-- Begin -->/); $flag = 0 if (/<!-- End -->/); process_line($_) if ($flag); }

      I notice you have a different method for reading an entire file into memory then I am used to seeing. I was wondering if there is any reason you are aware of that makes your way better or worse than the way I use (or if they are just different (TIMTOWTDI)). If there is, I'd love to hear about it.

      read IN, my $html, -s $file; # your code my $file = do{ local $/; <IN> }; # The way I usually use.

      Other monks... can anybody tell me if there is an advantage in one way or the other?

      Update The secode line of cone above is the way I am used to seeing... I forgot to complete the comment. My question regards the difference between using $/ as opposed to using read to slurp in an entire file.


      They say that time changes things, but you actually have to change them yourself.

      —Andy Warhol

        What is the way you're used to seeing? Maybe I am the one in need of enlightenment and your way is superior.

        To answer your question, there is no reason why I do it that way except that's the way I learned how to do it. Maybe there was a good reason to do it like that back then which has now been made moot by advances in Perl - I don't know.
        IMO, the difference will be what kind of buffered I/O gets performed. Fuller explanation: Re: Speed reading (files)

        One world, one people

Re: file reading issues
by bofh_of_oz (Hermit) on Aug 03, 2005 at 19:13 UTC
    A regex will parse the file just fine. Just grab the whole file into a variable, then do this:

    #Sample data, multiline $line = "<!-- Begin -->Line one\nLine two\nthree\nfour<!-- End -->"; #process them $line =~ s/<!-- Begin -->(.*)<!-- End -->/$1/s; print $line;

    I tried to do the same while reading the file line-by-line... The code was so ugly that I simply recommend to read in the whole file at once and do a multiline regexp above...

    HTH

    --------------------------------
    An idea is not responsible for the people who believe in it...

Re: file reading issues
by mojobozo (Monk) on Aug 03, 2005 at 19:17 UTC
    Follow up question: How can I strip off the leading blank spaces on each line? Keep in mind that I'm using jdporter's snipit of code for grabbing between the comments. I like to indent my html for readability (mine) but don't need all those spaces in the form.

    _____________________________________________________
    mojobozo
    word (wûrd)
    interj. Slang. Used to express approval or an affirmative response to
    something. Sometimes used with up. Source
      Using [id://jdporter]'s code:
      while (<FILENAME>) { last if /<!-- Begin -->/; } while (<FILENAME>) { last if /<!-- End -->/; s/^\s*//; #<- new line here print ; }
Re: file reading issues
by wfsp (Abbot) on Aug 04, 2005 at 10:21 UTC
    I have comments in the file and want to grab the stuff between them.
    I would consider using HTML::TokeParser. I use the following.

    #!/bin/perl5 use strict; use warnings; use HTML::TokeParser; my $file = 'index.html'; my $tp = HTML::TokeParser->new($file) or die "Couldn't parse $file: $!"; my ($start, $html); while (my $tag = $tp->get_token) { if ( $tag->[0] eq 'C' and $tag->[1] eq '<!-- article start -->' ) { $start++; next; } next unless $start; if ( $tag->[0] eq 'C' and $tag->[1] eq '<!-- article end -->' ) { last; } $html .= $tag->[4] if $tag->[0] eq 'S'; $html .= $tag->[1] if $tag->[0] eq 'T' or $tag->[0] eq 'C'; $html .= $tag->[2] if $tag->[0] eq 'E'; } print "$html\n"; # ["S", $tag, $attr, $attrseq, $text] # ["E", $tag, $text] # ["T", $text, $is_data] # ["C", $text] # ["D", $text] # ["PI", $token0, $text]