Regular Expression tricky newline problem

jkva has asked for the wisdom of the Perl Monks concerning the following question:

Fellow monks,

I am trying to accomplish the following : Say I have a file called info.txt :

Line1 : Dit is de eerste regel
Line2 : Dit is de tweede regel
Line3 : Dit is de derde regel
Line4 : Dit is de vierde regel
[download]

I have slurped this into a scalar called $contents using open and my $contents = join('', <FILE>). This all works right.

My objective is then to create a regular expression that captures what comes after "Line3 :" until the end of that line, so basically until it meets a newline after that.
I have tried several things for quite a while now and don't seem to be getting closer. The greedy .* operator seems to get me the closest but sunce I have to ignore newlines using the /s flag I can't get this to work.

I would be grateful for any help.... knowing PM I will be getting a "D'oh why didn't I think of that" answer ;-)

-- jkva

Comment on Regular Expression tricky newline problem Select or Download Code

Replies are listed 'Best First'.
Re: Regular Expression tricky newline problem by saintmike (Vicar) on Jan 02, 2006 at 22:13 UTC
Greedy matching works as long as you're not using the `/s` modifier: `use strict; my $string = join '', <DATA>; if($string =~ /^Line3 : (.)/m) { print "$1\n"; } __DATA__ Line1 : Dit is de eerste regel Line2 : Dit is de tweede regel Line3 : Dit is de derde regel Line4 : Dit is de vierde regel` [download] The `/m` modifier is necessary to have the `^` anchor match the beginning of any line in a multi-line string. The greedy `.` will then match anything until the end of that line. Had you used the `/s` modifier, the greedy `.*` would have matched newlines as well and therefore gobbled up everything until the end of the multi-line string.	[reply] [d/l] [select]
Re: Regular Expression tricky newline problem by tirwhan (Abbot) on Jan 02, 2006 at 22:13 UTC
Use a non-greedy match: `#!/usr/bin/perl use strict; use warnings; local $/=undef; my $contents=<DATA>; my ($thirdline) = $contents =~ m/Line3 : (.?)\n/s; print $thirdline."\n"; __DATA__ Line1 : Dit is de eerste regel Line2 : Dit is de tweede regel Line3 : Dit is de derde regel Line4 : Dit is de vierde regel` [download] Output: `Dit is de derde regel` [download] Or you could match against a negated character class: `my ($thirdline) = $contents =~ m/Line3 : ([^\n]+)/s;` [download] A computer is a state machine. Threads are for people who can't program state machines.* -- Alan Cox	[reply] [d/l] [select]
Re: Regular Expression tricky newline problem by bobf (Monsignor) on Jan 02, 2006 at 22:16 UTC
Use '?' to get a nongreedy match up to the first newline: `$data =~ m/Line3 : (.?)\n/s;` [download] You can also use a greedy match with a negated character class: `$data =~ m/Line3 : ([^\n])/s;` [download] My test code follows: `use warnings; use strict; my $data = join( '', <DATA> ); print "[$data]\n\n"; $data =~ m/Line3 : (.?)\n/s or die; print "match: [$1]\n"; $data =~ m/Line3 : ([^\n])/s or die; print "match: [$1]\n"; __DATA__ Line1 : Dit is de eerste regel Line2 : Dit is de tweede regel Line3 : Dit is de derde regel Line4 : Dit is de vierde regel` [download]	[reply] [d/l] [select]
Re: Regular Expression tricky newline problem by GrandFather (Saint) on Jan 02, 2006 at 22:37 UTC
Some sample code with "this is what I get", and "this is what I want" would help understand where you are having a problem. The following should be a good starting point, if not the stimulus for a D'oh moment :). `use strict; use warnings; my $lines = do {local $/; <DATA>}; my ($line3) = $lines =~ /Line 3 : (.*?)\n/; print ">$line3<"; __DATA__ Line 1 : Dit is de eerste regel Line 2 : Dit is de tweede regel Line 3 : Dit is de derde regel Line 4 : Dit is de vierde regel` [download] Prints: `>Dit is de derde regel<` [download] DWIM is Perl's answer to Gödel	[reply] [d/l] [select]
Re: Regular Expression tricky newline problem by graff (Chancellor) on Jan 03, 2006 at 03:50 UTC
You've got answers for doing the appropriate regex on the slurped file data, as well as suggestions on improving how you do the slurp, so I'd just like to add that I wouldn't use a whole-file slurp into a scalar in a case like this. The task appears to be line-oriented, so it would make sense to stick with line-oriented handling of the data. Depending on what else might need to be done with the file contents in the same script (whether you need to do things with other lines besides "Line 3"), you could either read the whole file into an array of lines and use grep on the array, or else use grep directly on the line-oriented file-read operator: `# load file into an array of lines, and use "Line 3": my @lines = <FILE>; my ( $keeper ) = grep /^Line 3 : /, @lines; # or just get "Line 3" from the file, and skip the rest: #my ($keeper) = grep /^Line 3 : /, <FILE>; # (update: added parens around $keeper, as per Aristotle's correction) # either way, remove the unwanted content from the kept line: $keeper =~ s/Line 3 : //;` [download]	[reply] [d/l]
Re^2: Regular Expression tricky newline problem by Aristotle (Chancellor) on Jan 03, 2006 at 13:16 UTC
Careful, you’re invoking grep in scalar context. `$keeper` will only contain the count of matches. This has to be written with a parenthesised my, like so: `my ( $keeper ) = grep /^Line 3 : /, @lines;` [download] However, that always goes through the entire data, regardless of where the match is found. A better way would be List::Util’s `first`; with which the context does not matter either: `use List::Util qw( first ); my $keeper = first { /^Line 3 : / } @lines;` [download] Makeshifts last the longest.	[reply] [d/l] [select]
Re: Regular Expression tricky newline problem by davidrw (Prior) on Jan 03, 2006 at 02:58 UTC
I have slurped this into a scalar called $contents using `open` and `my $contents = join('', <FILE>)`. This all works right. That works, of course .. just wanted to share my immediate thought of this node: Perl Idioms Explained - my $string = do { local $/; <FILEHANDLE> };	[reply] [d/l] [select]

Back to Seekers of Perl Wisdom