http://www.perlmonks.org?node_id=520462

jkva has asked for the wisdom of the Perl Monks concerning the following question:

Fellow monks,

I am trying to accomplish the following : Say I have a file called info.txt :
Line1 : Dit is de eerste regel Line2 : Dit is de tweede regel Line3 : Dit is de derde regel Line4 : Dit is de vierde regel

I have slurped this into a scalar called $contents using open and my $contents = join('', <FILE>). This all works right.

My objective is then to create a regular expression that captures what comes after "Line3 :" until the end of that line, so basically until it meets a newline after that.
I have tried several things for quite a while now and don't seem to be getting closer. The greedy .* operator seems to get me the closest but sunce I have to ignore newlines using the /s flag I can't get this to work.

I would be grateful for any help.... knowing PM I will be getting a "D'oh why didn't I think of that" answer ;-)

-- jkva

Replies are listed 'Best First'.
Re: Regular Expression tricky newline problem
by saintmike (Vicar) on Jan 02, 2006 at 22:13 UTC
    Greedy matching works as long as you're not using the /s modifier:
    use strict; my $string = join '', <DATA>; if($string =~ /^Line3 : (.*)/m) { print "$1\n"; } __DATA__ Line1 : Dit is de eerste regel Line2 : Dit is de tweede regel Line3 : Dit is de derde regel Line4 : Dit is de vierde regel
    The /m modifier is necessary to have the ^ anchor match the beginning of any line in a multi-line string. The greedy .* will then match anything until the end of that line.

    Had you used the /s modifier, the greedy .* would have matched newlines as well and therefore gobbled up everything until the end of the multi-line string.

Re: Regular Expression tricky newline problem
by tirwhan (Abbot) on Jan 02, 2006 at 22:13 UTC

    Use a non-greedy match:

    #!/usr/bin/perl use strict; use warnings; local $/=undef; my $contents=<DATA>; my ($thirdline) = $contents =~ m/Line3 : (.*?)\n/s; print $thirdline."\n"; __DATA__ Line1 : Dit is de eerste regel Line2 : Dit is de tweede regel Line3 : Dit is de derde regel Line4 : Dit is de vierde regel
    Output:
    Dit is de derde regel
    Or you could match against a negated character class:
    my ($thirdline) = $contents =~ m/Line3 : ([^\n]+)/s;

    A computer is a state machine. Threads are for people who can't program state machines. -- Alan Cox
Re: Regular Expression tricky newline problem
by bobf (Monsignor) on Jan 02, 2006 at 22:16 UTC

    Use '?' to get a nongreedy match up to the first newline:

    $data =~ m/Line3 : (.*?)\n/s;

    You can also use a greedy match with a negated character class:

    $data =~ m/Line3 : ([^\n]*)/s;

    My test code follows:

    use warnings; use strict; my $data = join( '', <DATA> ); print "[$data]\n\n"; $data =~ m/Line3 : (.*?)\n/s or die; print "match: [$1]\n"; $data =~ m/Line3 : ([^\n]*)/s or die; print "match: [$1]\n"; __DATA__ Line1 : Dit is de eerste regel Line2 : Dit is de tweede regel Line3 : Dit is de derde regel Line4 : Dit is de vierde regel

Re: Regular Expression tricky newline problem
by GrandFather (Saint) on Jan 02, 2006 at 22:37 UTC

    Some sample code with "this is what I get", and "this is what I want" would help understand where you are having a problem. The following should be a good starting point, if not the stimulus for a D'oh moment :).

    use strict; use warnings; my $lines = do {local $/; <DATA>}; my ($line3) = $lines =~ /Line 3 : (.*?)\n/; print ">$line3<"; __DATA__ Line 1 : Dit is de eerste regel Line 2 : Dit is de tweede regel Line 3 : Dit is de derde regel Line 4 : Dit is de vierde regel

    Prints:

    >Dit is de derde regel<

    DWIM is Perl's answer to Gödel
Re: Regular Expression tricky newline problem
by graff (Chancellor) on Jan 03, 2006 at 03:50 UTC
    You've got answers for doing the appropriate regex on the slurped file data, as well as suggestions on improving how you do the slurp, so I'd just like to add that I wouldn't use a whole-file slurp into a scalar in a case like this.

    The task appears to be line-oriented, so it would make sense to stick with line-oriented handling of the data. Depending on what else might need to be done with the file contents in the same script (whether you need to do things with other lines besides "Line 3"), you could either read the whole file into an array of lines and use grep on the array, or else use grep directly on the line-oriented file-read operator:

    # load file into an array of lines, and use "Line 3": my @lines = <FILE>; my ( $keeper ) = grep /^Line 3 : /, @lines; # or just get "Line 3" from the file, and skip the rest: #my ($keeper) = grep /^Line 3 : /, <FILE>; # (update: added parens around $keeper, as per Aristotle's correction) # either way, remove the unwanted content from the kept line: $keeper =~ s/Line 3 : //;

      Careful, you’re invoking grep in scalar context. $keeper will only contain the count of matches. This has to be written with a parenthesised my, like so:

      my ( $keeper ) = grep /^Line 3 : /, @lines;

      However, that always goes through the entire data, regardless of where the match is found. A better way would be List::Util’s first; with which the context does not matter either:

      use List::Util qw( first ); my $keeper = first { /^Line 3 : / } @lines;

      Makeshifts last the longest.

Re: Regular Expression tricky newline problem
by davidrw (Prior) on Jan 03, 2006 at 02:58 UTC