Regular expression to match text between two tags (was: Help!! Regular Expressions)

providencia has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: Help!! Regular Expressions by Corion (Patriarch) on Apr 12, 2001 at 12:03 UTC
As you already read MRE, Death to Dot Star! will be clear to you, if it's not, read it and then go back to MRE `:-)`. The Owl book (Mastering Regular Expressions) has some quite good hints at this, and it also demonstrates an interesting technique on how to avoid `.` or the lazy version `.?`. Here's what I came up with after reading through my copy of MRE. Note that my solutions differ from physis solution in that they take the shortest possible match, while physis solution always takes the longest possible text between `:::`. #!/usr/bin/perl -w use strict; # Slurp the input into $data my $data; { local $/; $data = <DATA>; }; # This is the naive way, using the non-greedy .? if ($data =~ /:::(.?):::/ms) { print "$1\n"; } else { print "No match.\n"; }; # This is the "perfect" way (should be described in the Owl book # somewhere). It's much more specific about what it wants, and # thus longer and more complex :-) if ($data =~ /::: # start ([^:]* # As many non-: as we can gobble (?: ::?[^:]+ # and then one or two :'s as long a +s they are )* # followed by something non-: ) ::: # end /msx # And we want to match spanning lin +es # and use eXtended re syntax ) { print "$1\n"; } else { print "No match.\n"; }; __DATA__ Some foo:::This is a some text. Today you watered the dog and ::test: bathed the plants. The server asked you what permission you had to tell it what to do on it's day off. This was your day.::: More foo. [download]	[reply] [d/l]
Re: Re: Help!! Regular Expressions by suaveant (Parson) on Apr 12, 2001 at 16:25 UTC
Small point, but since you don't use . in your RE, you don't actually need the s modifier. You also don't match end or beginning of string, so you don't need the m modifier. Not that it matters much. - Ant	[reply]
Re: Help!! Regular Expressions by dws (Chancellor) on Apr 12, 2001 at 11:44 UTC
Sounds like you've put in a bit of work on figuring this out, so I won't cheat you out of that burst of pleasure that comes from solving a problem yourself. (I.e., a hint is all you're gonna get.) Take a look at perlre, and note which of the regular expression "modifiers" change what '.' will match.	[reply]
Re: Help!! Regular Expressions by deprecated (Priest) on Apr 12, 2001 at 11:49 UTC
Mastering regular expressions is indeed a good book, but I've heard it called dated by quite a few people. Probably what you want to do in this circumstance is something that you should rarely do, and that is tinker with `$/`. In this case, I suspect this shall suffice: `my $lineterm = $/; $/ = ''; my $thingy = $_ =~ /:::(.):::/; $/ = $lineterm;` [download] I dont believe perl's regexes behave like sed(1) or awk(1)'s in that they require flags and modifiers to catch newlines. More information on the tricky `$/` is of course in perlvar. and of course you should read that book cover to cover. it is extremely helpful. I learned a great deal from that book. brother dep. update:* me and my itchy trigger finger. well i just spoke to dws in the chatterbox, and while i wasnt able to test this out, he was. apparently the re engine (at least as recently as 5.6) does not care what $/ is set to for the end-of-line character. which means this node isnt quite worthless because, yay, it taught me something. grumble -- Laziness, Impatience, Hubris, and Generosity.	[reply] [d/l] [select]
Re: Re: Help!! Regular Expressions by suaveant (Parson) on Apr 12, 2001 at 16:29 UTC
It may not have the latest RE stuff... but it really covers the basics, which, really, isn't that what most people need? Granted a new chapter on perl would be nice, but a lot of the stuff in that book won't be dated until real changes are made to the core of Regular Expressions - Ant	[reply]
Re: Help!! Regular Expressions by bjelli (Pilgrim) on Apr 12, 2001 at 14:13 UTC
another solution, without regular expressions. I assumed that you're reading from a file, and that you want to read in more the one lump of data: `#!/usr/bin/perl -w use strict; { local $/ = ":::"; # read stuff separated by ::: while(<DATA>) { s/:::$//; # remove the ::: at the end of # each lump of data print "found a piece -->$_<--\n\n"; } } __DATA__ Some foo:::This is a some text. It talks about CGI::Carp, but then suddenly changes subject to matching three :'s in a row ::: The server asked you what permission you had to tell it what to do on it's day off. This was your day.::: More foo.` [download] hope that helps -- Brigitte 'I never met a chocolate I didnt like' Jellinek http://www.horus.com/~bjelli/ http://perlwelt.horus.at	[reply] [d/l]
Re: Help!! Regular Expressions by Caillte (Friar) on Apr 12, 2001 at 15:26 UTC
Proving TIMTOWTDI, and stiring the pot some more, how about this one? I used the same data as corion so they can be easily compared. `$data = <DATA>; (@array) = split /:::/, $data; print join "\n", @array; __DATA__ Some foo:::This is a some text. Today you watered the dog and ::test: +bathed the plants. The server asked you what permission you had to te +ll it what to do on it's day off. This was your day.::: More foo.` [download] This gives: Some foo This is a some text. Today you watered the dog and ::test: bathed the plants. The server asked you what permission you had to tell it what to do on it's day off. This was your day. More foo. A little more sophistication and you could quite easily edit how the data s read `$japh->{'Caillte'} = $me;`	[reply] [d/l] [select]
Re: Help!! Regular Expressions by physi (Friar) on Apr 12, 2001 at 11:54 UTC
Hmm what do you get with your regexp ? I guess only the first row ? Is the text in a file ? If so, try to set $/=undef: `$/=undef; $t=<FILEHANDLE>; $t =~ s/^:::(.*):::/$1/; print $t;` [download] This should work, if your memory if big enough to read the whole file in one `$t`. `----------------------------------- --the good, the bad and the physi-- -----------------------------------` [download]	[reply] [d/l] [select]
Re: Help!! Regular Expressions by Rhandom (Curate) on Apr 12, 2001 at 20:38 UTC
I may be over simplifying things, but this should do: `my ($thingy) = $_ =~ /:::(.?):::/s;` [download] There is no global on it so it will get the first one. If for some chance there was more than one that you wanted, I would go with `my (@thingies) = $_ =~ /:::(.?)(?=:::)/sg; # ?= allows for it to start at the ::: on the next time through` [download] I keep seeing people talk about the dot star issue, but the dot star issue is not as much an issue if you have qualifiers before and after that force it to match a specific location.	[reply] [d/l] [select]
Re: Help!! Regular Expressions by providencia (Pilgrim) on Apr 12, 2001 at 20:06 UTC
Okay here's what I learned today from a friend. "You are sipping when you should be slurping." Of course if I had told you all what I was doing before that line you would have definitely seen my real problem. Okay here's close to what I did have (I burned that code long ago): `#!/usr/bin/perl -w use strict; open(FH, '-'); # using STDIN instead of a real file. while (<FH>){ my $entry = $_ =~ /:::(.?):::/ms; open(FH2,'>filename'); print FH2 "$entry"; }` [download] Of course this didn't work because I was only working with the first line and what I wanted was further in the file and on multiple lines. Here's what I am using. `#!/usr/bin/perl -w use strict; open(FH,'-'); my $file = do {local $/; <FH>}; #A really elegant way to slurp an entire file and fast too. #It creates a "disposable subroutine" in effect. #I can't take credit for coming up with it though. close FH; open(FH2,">filename"); my $entry = $file =~ /:::(.?):::/ms; #Thanks Corion print FH2 "$1\n";` [download] Thanks everyone. I liked the hint dws. Thanks Corion for telling me about the not so greedy (.?). I liked the hint dws. I learned from that hint but it wouldn't work for me until I slurped. :) I'm REAL* glad to know about perlre until I can make time for Mastering Regular Expressions	[reply] [d/l] [select]


Come for the quick hacks, stay for the epiphanies.
	PerlMonks