http://www.perlmonks.org?node_id=476448

Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Hi monks

I have another question regarding pattern matching.

I have a piece of html that I need to search through. The html looks like this:

<!-- Start_of_revision--> revision1 <!-- End_of_revision--> <!-- Start_of_revision--> revision2 <!-- End_of_revision--> <!-- Start_of_revision--> revision3 <!-- End_of_revision-->
What I need to do is find the three revisions between <!-- Start_of_revision--> and <!-- End_of_revision-->.

I am using the following bit of code to do this, but I am only printing "revision1" and not the second two:

my $file = $foo_bar_file; open (FILE,"<$file") || die $!; read FILE, my $text, -s $file; close(FILE); if($text =~ /<!--Start_of_revision-->(.*?)<!-- End_of_revision-->/ +sg) { print $1; }
I am slurping in the whole file, I am using .*? (i.e. non greedy), and I'm using the s and g modifiers. I thought that this would be enough to find all three occurrences of anything between <!-- Start_of_revision--> and <!-- End_of_revision-->, so what am I doing wrong??

Thanks in advance,

C J

Replies are listed 'Best First'.
Re: Regular expressions
by rev_1318 (Chaplain) on Jul 20, 2005 at 10:09 UTC
    if will only look for the first occurrence. You're looking for while...

    Paul

      <sheepish>
      Of course. Thanks!
      </sheepish>
      C J
Re: Regular expressions
by GrandFather (Saint) on Jul 20, 2005 at 10:16 UTC

    Two problems:

  • Missing space in <!--Start_of_revision-->
  • if rather than while

    A working version of your code in a form that is better for stand alone testing is:


    Perl is Huffman encoded by design.

      That's an aweful way of reading a file. It means every line must be pushed onto the stack. The OP's method was better, and the following is even better because it avoids a call to stat and works will all kinds of IO handles:

      my $text; { local $/; $text = <DATA>; }

      Visually, I prefer
      my $text = do { local $/; <DATA> };
      but I think I determined the above is equivalent to
      my $text; { local $/; my $temp = <DATA>; $text = $temp; }

        The reason I am a monk is to learn from the masters. I keep forgetting about $/! I hope I have learned :). Thank you.


        Perl is Huffman encoded by design.
Re: Regular expressions
by tphyahoo (Vicar) on Jul 20, 2005 at 15:52 UTC
    You could also use File::Slurp. Not sure if it's advantageous speed or algorithmwise; the documentation claims it does. But I just like it for readability.