Beefy Boxes and Bandwidth Generously Provided by pair Networks
Come for the quick hacks, stay for the epiphanies.
 
PerlMonks  

Regular expression to match text between two tags (was: Help!! Regular Expressions)

by providencia (Pilgrim)
on Apr 12, 2001 at 11:30 UTC ( [id://71966]=perlquestion: print w/replies, xml ) Need Help??

providencia has asked for the wisdom of the Perl Monks concerning the following question:

I can't seem to think this one through.
I want to get everything in between the :::'s including any new lines and special characters.
Here's what I have:
my ($thingy) = $_ =~ /^:::(.*)/;
This is the text:

:::This is a some text. Today you watered the dog and
bathed the plants. The server asked you what permission you
had to tell it what to do on it's day off. This was your day.:::

I am doing something very silly here.
I have looked over my copy of Mastering Regular Expressions and still haven't figured this out.
I swear one day I'm going to read it cover to cover.

2001-04-13 Edit by Corion : Changed title

  • Comment on Regular expression to match text between two tags (was: Help!! Regular Expressions)
  • Download Code

Replies are listed 'Best First'.
Re: Help!! Regular Expressions
by Corion (Patriarch) on Apr 12, 2001 at 12:03 UTC

    As you already read MRE, Death to Dot Star! will be clear to you, if it's not, read it and then go back to MRE :-).

    The Owl book (Mastering Regular Expressions) has some quite good hints at this, and it also demonstrates an interesting technique on how to avoid .* or the lazy version .*?. Here's what I came up with after reading through my copy of MRE. Note that my solutions differ from physis solution in that they take the shortest possible match, while physis solution always takes the longest possible text between :::.

    #!/usr/bin/perl -w use strict; # Slurp the input into $data my $data; { local $/; $data = <DATA>; }; # This is the naive way, using the non-greedy .*? if ($data =~ /:::(.*?):::/ms) { print "$1\n"; } else { print "No match.\n"; }; # This is the "perfect" way (should be described in the Owl book # somewhere). It's much more specific about what it wants, and # thus longer and more complex :-) if ($data =~ /::: # start ([^:]* # As many non-: as we can gobble (?: ::?[^:]+ # and then one or two :'s as long a +s they are )* # followed by something non-: ) ::: # end /msx # And we want to match spanning lin +es # and use eXtended re syntax ) { print "$1\n"; } else { print "No match.\n"; }; __DATA__ Some foo:::This is a some text. Today you watered the dog and ::test: bathed the plants. The server asked you what permission you had to tell it what to do on it's day off. This was your day.::: More foo.
      Small point, but since you don't use . in your RE, you don't actually need the s modifier. You also don't match end or beginning of string, so you don't need the m modifier. Not that it matters much.
                      - Ant
Re: Help!! Regular Expressions
by dws (Chancellor) on Apr 12, 2001 at 11:44 UTC
    Sounds like you've put in a bit of work on figuring this out, so I won't cheat you out of that burst of pleasure that comes from solving a problem yourself. (I.e., a hint is all you're gonna get.)

    Take a look at perlre, and note which of the regular expression "modifiers" change what '.' will match.

Re: Help!! Regular Expressions
by deprecated (Priest) on Apr 12, 2001 at 11:49 UTC
    Mastering regular expressions is indeed a good book, but I've heard it called dated by quite a few people. Probably what you want to do in this circumstance is something that you should rarely do, and that is tinker with $/. In this case, I suspect this shall suffice:
    my $lineterm = $/; $/ = ''; my $thingy = $_ =~ /:::(.*):::/; $/ = $lineterm;
    I dont believe perl's regexes behave like sed(1) or awk(1)'s in that they require flags and modifiers to catch newlines. More information on the tricky $/ is of course in perlvar.

    and of course you should read that book cover to cover. it is extremely helpful. I learned a great deal from that book.

    brother dep.

    update: me and my itchy trigger finger. well i just spoke to dws in the chatterbox, and while i wasnt able to test this out, he was. apparently the re engine (at least as recently as 5.6) does not care what $/ is set to for the end-of-line character. which means this node isnt quite worthless because, yay, it taught me something. *grumble*

    --
    Laziness, Impatience, Hubris, and Generosity.

      It may not have the latest RE stuff... but it really covers the basics, which, really, isn't that what most people need? Granted a new chapter on perl would be nice, but a lot of the stuff in that book won't be dated until real changes are made to the core of Regular Expressions
                      - Ant
Re: Help!! Regular Expressions
by bjelli (Pilgrim) on Apr 12, 2001 at 14:13 UTC

    another solution, without regular expressions. I assumed that you're reading from a file, and that you want to read in more the one lump of data:

    #!/usr/bin/perl -w use strict; { local $/ = ":::"; # read stuff separated by ::: while(<DATA>) { s/:::$//; # remove the ::: at the end of # each lump of data print "found a piece -->$_<--\n\n"; } } __DATA__ Some foo:::This is a some text. It talks about CGI::Carp, but then suddenly changes subject to matching three :'s in a row ::: The server asked you what permission you had to tell it what to do on it's day off. This was your day.::: More foo.

    hope that helps

    --
    Brigitte    'I never met a chocolate I didnt like'    Jellinek
    http://www.horus.com/~bjelli/         http://perlwelt.horus.at
Re: Help!! Regular Expressions
by Caillte (Friar) on Apr 12, 2001 at 15:26 UTC

    Proving TIMTOWTDI, and stiring the pot some more, how about this one? I used the same data as corion so they can be easily compared.

    $data = <DATA>; (@array) = split /:::/, $data; print join "\n", @array; __DATA__ Some foo:::This is a some text. Today you watered the dog and ::test: +bathed the plants. The server asked you what permission you had to te +ll it what to do on it's day off. This was your day.::: More foo.

    This gives:

    Some foo
    This is a some text. Today you watered the dog and ::test: bathed the plants. The server asked you what permission you had to tell it what to do on it's day off. This was your day.
     More foo.

    A little more sophistication and you could quite easily edit how the data s read

    $japh->{'Caillte'} = $me;

Re: Help!! Regular Expressions
by physi (Friar) on Apr 12, 2001 at 11:54 UTC
    Hmm what do you get with your regexp ?
    I guess only the first row ?
    Is the text in a file ?
    If so, try to set $/=undef:
    $/=undef; $t=<FILEHANDLE>; $t =~ s/^:::(.*):::/$1/; print $t;
    This should work, if your memory if big enough to read the whole file in one $t.
    ----------------------------------- --the good, the bad and the physi-- -----------------------------------
Re: Help!! Regular Expressions
by Rhandom (Curate) on Apr 12, 2001 at 20:38 UTC
    I may be over simplifying things, but this should do:
    my ($thingy) = $_ =~ /:::(.*?):::/s;
    There is no global on it so it will get the first one. If for some chance there was more than one that you wanted, I would go with
    my (@thingies) = $_ =~ /:::(.*?)(?=:::)/sg; # ?= allows for it to start at the ::: on the next time through
    I keep seeing people talk about the dot star issue, but the dot star issue is not as much an issue if you have qualifiers before and after that force it to match a specific location.
Re: Help!! Regular Expressions
by providencia (Pilgrim) on Apr 12, 2001 at 20:06 UTC
    Okay here's what I learned today from a friend.
    "You are sipping when you should be slurping."

    Of course if I had told you all what I was doing before
    that line you would have definitely seen my real problem.

    Okay here's close to what I did have (I burned that code long ago):

    #!/usr/bin/perl -w use strict; open(FH, '-'); # using STDIN instead of a real file. while (<FH>){ my $entry = $_ =~ /:::(.*?):::/ms; open(FH2,'>filename'); print FH2 "$entry"; }
    Of course this didn't work because I was only working with
    the first line and what I wanted was further in the file
    and on multiple lines. Here's what I am using.
    #!/usr/bin/perl -w use strict; open(FH,'-'); my $file = do {local $/; <FH>}; #A really elegant way to slurp an entire file and fast too. #It creates a "disposable subroutine" in effect. #I can't take credit for coming up with it though. close FH; open(FH2,">filename"); my $entry = $file =~ /:::(.*?):::/ms; #Thanks Corion print FH2 "$1\n";

    Thanks everyone. I liked the hint dws.
    Thanks Corion for telling me about the not so greedy (.*?).
    I liked the hint dws.
    I learned from that hint but it wouldn't work for me until I slurped. :)
    I'm REAL glad to know about perlre until I can make time for Mastering Regular Expressions

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://71966]
Approved by root
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others wandering the Monastery: (3)
As of 2024-04-19 19:22 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found