Beefy Boxes and Bandwidth Generously Provided by pair Networks
go ahead... be a heretic
 
PerlMonks  

Regular expressions

by Anonymous Monk
on Jul 20, 2005 at 10:04 UTC ( [id://476448]=perlquestion: print w/replies, xml ) Need Help??

Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Hi monks

I have another question regarding pattern matching.

I have a piece of html that I need to search through. The html looks like this:

<!-- Start_of_revision--> revision1 <!-- End_of_revision--> <!-- Start_of_revision--> revision2 <!-- End_of_revision--> <!-- Start_of_revision--> revision3 <!-- End_of_revision-->
What I need to do is find the three revisions between <!-- Start_of_revision--> and <!-- End_of_revision-->.

I am using the following bit of code to do this, but I am only printing "revision1" and not the second two:

my $file = $foo_bar_file; open (FILE,"<$file") || die $!; read FILE, my $text, -s $file; close(FILE); if($text =~ /<!--Start_of_revision-->(.*?)<!-- End_of_revision-->/ +sg) { print $1; }
I am slurping in the whole file, I am using .*? (i.e. non greedy), and I'm using the s and g modifiers. I thought that this would be enough to find all three occurrences of anything between <!-- Start_of_revision--> and <!-- End_of_revision-->, so what am I doing wrong??

Thanks in advance,

C J

Replies are listed 'Best First'.
Re: Regular expressions
by rev_1318 (Chaplain) on Jul 20, 2005 at 10:09 UTC
    if will only look for the first occurrence. You're looking for while...

    Paul

      <sheepish>
      Of course. Thanks!
      </sheepish>
      C J
Re: Regular expressions
by GrandFather (Saint) on Jul 20, 2005 at 10:16 UTC

    Two problems:

  • Missing space in <!--Start_of_revision-->
  • if rather than while

    A working version of your code in a form that is better for stand alone testing is:


    Perl is Huffman encoded by design.

      That's an aweful way of reading a file. It means every line must be pushed onto the stack. The OP's method was better, and the following is even better because it avoids a call to stat and works will all kinds of IO handles:

      my $text; { local $/; $text = <DATA>; }

      Visually, I prefer
      my $text = do { local $/; <DATA> };
      but I think I determined the above is equivalent to
      my $text; { local $/; my $temp = <DATA>; $text = $temp; }

        The reason I am a monk is to learn from the masters. I keep forgetting about $/! I hope I have learned :). Thank you.


        Perl is Huffman encoded by design.
Re: Regular expressions
by tphyahoo (Vicar) on Jul 20, 2005 at 15:52 UTC
    You could also use File::Slurp. Not sure if it's advantageous speed or algorithmwise; the documentation claims it does. But I just like it for readability.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://476448]
Approved by GrandFather
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others browsing the Monastery: (2)
As of 2025-07-12 06:32 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found

    Notices?
    erzuuliAnonymous Monks are no longer allowed to use Super Search, due to an excessive use of this resource by robots.